ruff/crates/ty_python_semantic/resources/mdtest/scopes/eager.md
David Peter 71d711257a
[ty] No union with Unknown for module-global symbols (#20664)
## Summary

Quoting from the newly added comment:

Module-level globals can be mutated externally. A `MY_CONSTANT = 1`
global might be changed to `"some string"` from code outside of the
module that we're looking at, and so from a gradual-guarantee
perspective, it makes sense to infer a type of `Literal[1] | Unknown`
for global symbols. This allows the code that does the mutation to type
check correctly, and for code that uses the global, it accurately
reflects the lack of knowledge about the type.

External modifications (or modifications through `global` statements)
that would require a wider type are relatively rare. From a practical
perspective, we can therefore achieve a better user experience by
trusting the inferred type. Users who need the external mutation to work
can always annotate the global with the wider type. And everyone else
benefits from more precise type inference.

I initially implemented this by applying literal promotion to the type
of the unannotated module globals (as suggested in
https://github.com/astral-sh/ty/issues/1069), but the ecosystem impact
showed a lot of problems (https://github.com/astral-sh/ruff/pull/20643).
I fixed/patched some of these problems, but this PR seems like a good
first step, and it seems sensible to apply the literal promotion change
in a second step that can be evaluated separately.

closes https://github.com/astral-sh/ty/issues/1069

## Ecosystem impact

This seems like an (unexpectedly large) net positive with 650 fewer
diagnostics overall.. even though this change will certainly catch more
true positives.

* There are 666 removed `type-assertion-failure` diagnostics, where we
were previously used the correct type already, but removing the
`Unknown` now leads to an "exact" match.
* 1464 of the 1805 total new diagnostics are `unresolved-attribute`
errors, most (1365) of which were previously
`possibly-missing-attribute` errors. So they could also be counted as
"changed" diagnostics.
* For code that uses constants like
  ```py
  IS_PYTHON_AT_LEAST_3_10 = sys.version_info >= (3, 10)
  ```
where we would have previously inferred a type of `Literal[True/False] |
Unknown`, removing the `Unknown` now allows us to do reachability
analysis on branches that use these constants, and so we get a lot of
favorable ecosystem changes because of that.
* There is code like the following, where we previously emitted
`conflicting-argument-forms` diagnostics on calls to the aliased
`assert_type`, because its type was `Unknown | def …` (and the call to
`Unknown` "used" the type form argument in a non type-form way):
  ```py
  if sys.version_info >= (3, 11):
      import typing
  
      assert_type = typing.assert_type
  else:
      import typing_extensions
  
      assert_type = typing_extensions.assert_type
  ```
* ~100 new `invalid-argument-type` false positives, due to missing
`**kwargs` support (https://github.com/astral-sh/ty/issues/247)

## Typing conformance

```diff
+protocols_modules.py:25:1: error[invalid-assignment] Object of type `<module '_protocols_modules1'>` is not assignable to `Options1`
```

This diagnostic should apparently not be there, but it looks like we
also fail other tests in that file, so it seems to be a limitation that
was previously hidden by `Unknown` somehow.

## Test Plan

Updated tests and relatively thorough ecosystem analysis.
2025-10-01 16:40:30 +02:00

8 KiB

Eager scopes

Some scopes are executed eagerly: references to variables defined in enclosing scopes are resolved immediately. This is in contrast to (for instance) function scopes, where those references are resolved when the function is called.

Function definitions

Function definitions are evaluated lazily.

x = 1

def f():
    reveal_type(x)  # revealed: Literal[1, 2]

x = 2

Class definitions

Class definitions are evaluated eagerly.

def _():
    x = 1

    class A:
        reveal_type(x)  # revealed: Literal[1]

        y = x

    x = 2

    reveal_type(A.y)  # revealed: Unknown | Literal[1]

List comprehensions

List comprehensions are evaluated eagerly.

def _():
    x = 1

    # revealed: Literal[1]
    [reveal_type(x) for a in range(1)]

    x = 2

Set comprehensions

Set comprehensions are evaluated eagerly.

def _():
    x = 1

    # revealed: Literal[1]
    {reveal_type(x) for a in range(1)}

    x = 2

Dict comprehensions

Dict comprehensions are evaluated eagerly.

def _():
    x = 1

    # revealed: Literal[1]
    {a: reveal_type(x) for a in range(1)}

    x = 2

Generator expressions

Generator expressions don't necessarily run eagerly, but in practice usually they do, so assuming they do is the better default.

def _():
    x = 1

    # revealed: Literal[1]
    list(reveal_type(x) for a in range(1))

    x = 2

But that does lead to incorrect results when the generator expression isn't run immediately:

def evaluated_later():
    x = 1

    # revealed: Literal[1]
    y = (reveal_type(x) for a in range(1))

    x = 2

    # The generator isn't evaluated until here, so at runtime, `x` will evaluate to 2, contradicting
    # our inferred type.
    print(next(y))

Though note that “the iterable expression in the leftmost for clause is immediately evaluated” [spec]:

def iterable_evaluated_eagerly():
    x = 1

    # revealed: Literal[1]
    y = (a for a in [reveal_type(x)])

    x = 2

    # Even though the generator isn't evaluated until here, the first iterable was evaluated
    # immediately, so our inferred type is correct.
    print(next(y))

Top-level eager scopes

All of the above examples behave identically when the eager scopes are directly nested in the global scope.

Class definitions

x = 1

class A:
    reveal_type(x)  # revealed: Literal[1]

    y = x

x = 2

reveal_type(A.y)  # revealed: Unknown | Literal[1]

List comprehensions

x = 1

# revealed: Literal[1]
[reveal_type(x) for a in range(1)]

x = 2

# error: [unresolved-reference]
[y for a in range(1)]
y = 1

Set comprehensions

x = 1

# revealed: Literal[1]
{reveal_type(x) for a in range(1)}

x = 2

# error: [unresolved-reference]
{y for a in range(1)}
y = 1

Dict comprehensions

x = 1

# revealed: Literal[1]
{a: reveal_type(x) for a in range(1)}

x = 2

# error: [unresolved-reference]
{a: y for a in range(1)}
y = 1

Generator expressions

x = 1

# revealed: Literal[1]
list(reveal_type(x) for a in range(1))

x = 2

# error: [unresolved-reference]
list(y for a in range(1))
y = 1

evaluated_later.py:

x = 1

# revealed: Literal[1]
y = (reveal_type(x) for a in range(1))

x = 2

# The generator isn't evaluated until here, so at runtime, `x` will evaluate to 2, contradicting
# our inferred type.
print(next(y))

iterable_evaluated_eagerly.py:

x = 1

# revealed: Literal[1]
y = (a for a in [reveal_type(x)])

x = 2

# Even though the generator isn't evaluated until here, the first iterable was evaluated
# immediately, so our inferred type is correct.
print(next(y))

Lazy scopes are "sticky"

As we look through each enclosing scope when resolving a reference, lookups become lazy as soon as we encounter any lazy scope, even if there are other eager scopes that enclose it.

Eager scope within eager scope

If we don't encounter a lazy scope, lookup remains eager. The resolved binding is not necessarily in the immediately enclosing scope. Here, the list comprehension and class definition are both eager scopes, and we immediately resolve the use of x to (only) the x = 1 binding.

def _():
    x = 1

    class A:
        # revealed: Literal[1]
        [reveal_type(x) for a in range(1)]

    x = 2

Class definition bindings are not visible in nested scopes

Class definitions are eager scopes, but any bindings in them are explicitly not visible to any nested scopes. (Those nested scopes are typically (lazy) function definitions, but the rule also applies to nested eager scopes like comprehensions and other class definitions.)

def _():
    x = 1

    class A:
        x = 4

        # revealed: Literal[1]
        [reveal_type(x) for a in range(1)]

        class B:
            # revealed: Literal[1]
            [reveal_type(x) for a in range(1)]

    x = 2

x = 1

def _():
    class C:
        # revealed: Literal[1]
        [reveal_type(x) for _ in [1]]
        x = 2

Eager scope within a lazy scope

The list comprehension is an eager scope, and it is enclosed within a function definition, which is a lazy scope. Because we pass through this lazy scope before encountering any bindings or definitions, the lookup is lazy.

def _():
    x = 1

    def f():
        # revealed: Literal[1, 2]
        [reveal_type(x) for a in range(1)]
    x = 2

Lazy scope within an eager scope

The function definition is a lazy scope, and it is enclosed within a class definition, which is an eager scope. Even though we pass through an eager scope before encountering any bindings or definitions, the lookup remains lazy.

def _():
    x = 1

    class A:
        def f():
            # revealed: Literal[1, 2]
            reveal_type(x)

    x = 2

Lazy scope within a lazy scope

No matter how many lazy scopes we pass through before encountering a binding or definition, the lookup remains lazy.

def _():
    x = 1

    def f():
        def g():
            # revealed: Literal[1, 2]
            reveal_type(x)
    x = 2

Eager scope within a lazy scope within another eager scope

We have a list comprehension (eager scope), enclosed within a function definition (lazy scope), enclosed within a class definition (eager scope), all of which we must pass through before encountering any binding of x. Even though the last scope we pass through is eager, the lookup is lazy, since we encountered a lazy scope on the way.

def _():
    x = 1

    class A:
        def f():
            # revealed: Literal[1, 2]
            [reveal_type(x) for a in range(1)]

    x = 2

Annotations

Type annotations are sometimes deferred. When they are, the types that are referenced in an annotation are looked up lazily, even if they occur in an eager scope.

Eager annotations in a Python file

from typing import ClassVar

x = int

class C:
    var: ClassVar[x]

reveal_type(C.var)  # revealed: int

x = str

Deferred annotations in a Python file

from __future__ import annotations

from typing import ClassVar

x = int

class C:
    var: ClassVar[x]

reveal_type(C.var)  # revealed: int | str

x = str

Deferred annotations in a stub file

from typing import ClassVar

x = int

class C:
    var: ClassVar[x]

# TODO: should ideally be `str`, but we currently consider all reachable bindings
reveal_type(C.var)  # revealed: int | str

x = str

Annotation scopes

[environment]
python-version = "3.12"

Type alias annotation scopes are lazy

type Foo = Bar

class Bar:
    pass

def _(x: Foo):
    if isinstance(x, Bar):
        reveal_type(x)  # revealed: Bar
    else:
        reveal_type(x)  # revealed: Never

Type-param scopes are eager, but bounds/constraints are deferred

# error: [unresolved-reference]
class D[T](Bar):
    pass

class E[T: Bar]:
    pass

# error: [unresolved-reference]
def g[T](x: Bar):
    pass

def h[T: Bar](x: T):
    pass

class Bar:
    pass