[ty] Infer nonlocal types as unions of all reachable bindings (#18750)

## Summary

This PR includes a behavioral change to how we infer types for public
uses of symbols within a module. Where we would previously use the type
that a use at the end of the scope would see, we now consider all
reachable bindings and union the results:

```py
x = None

def f():
    reveal_type(x)  # previously `Unknown | Literal[1]`, now `Unknown | None | Literal[1]`

f()

x = 1

f()
```

This helps especially in cases where the the end of the scope is not
reachable:

```py
def outer(x: int):
    def inner():
        reveal_type(x)  # previously `Unknown`, now `int`

    raise ValueError
```

This PR also proposes to skip the boundness analysis of public uses.
This is consistent with the "all reachable bindings" strategy, because
the implicit `x = <unbound>` binding is also always reachable, and we
would have to emit "possibly-unresolved" diagnostics for every public
use otherwise. Changing this behavior allows common use-cases like the
following to type check without any errors:

```py
def outer(flag: bool):
    if flag:
        x = 1

        def inner():
            print(x)  # previously: possibly-unresolved-reference, now: no error
```

closes https://github.com/astral-sh/ty/issues/210
closes https://github.com/astral-sh/ty/issues/607
closes https://github.com/astral-sh/ty/issues/699

## Follow up

It is now possible to resolve the following TODO, but I would like to do
that as a follow-up, because it requires some changes to how we treat
implicit attribute assignments, which could result in ecosystem changes
that I'd like to see separately.


315fb0f3da/crates/ty_python_semantic/src/semantic_index/builder.rs (L1095-L1117)

## Ecosystem analysis

[**Full report**](https://shark.fish/diff-public-types.html)

* This change obviously removes a lot of `possibly-unresolved-reference`
diagnostics (7818) because we do not analyze boundness for public uses
of symbols inside modules anymore.
* As the primary goal here, this change also removes a lot of
false-positive `unresolved-reference` diagnostics (231) in scenarios
like this:
    ```py
    def _(flag: bool):
        if flag:
            x = 1
    
            def inner():
                x
    
            raise
    ```
* This change also introduces some new false positives for cases like:
    ```py
    def _():
        x = None
    
        x = "test"
    
        def inner():
x.upper() # Attribute `upper` on type `Unknown | None | Literal["test"]`
is possibly unbound
    ```
We have test cases for these situations and it's plausible that we can
improve this in a follow-up.


## Test Plan

New Markdown tests
This commit is contained in:
David Peter 2025-06-26 12:24:40 +02:00 committed by GitHub
parent 2362263d5e
commit b01003f81d
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
17 changed files with 983 additions and 171 deletions

View file

@ -32,6 +32,24 @@ def _(flag1: bool, flag2: bool):
x = 1 # error: [conflicting-declarations] "Conflicting declared types for `x`: str, int"
```
## Incompatible declarations with repeated types
```py
def _(flag1: bool, flag2: bool, flag3: bool, flag4: bool):
if flag1:
x: str
elif flag2:
x: int
elif flag3:
x: int
elif flag4:
x: str
else:
x: bytes
x = "a" # error: [conflicting-declarations] "Conflicting declared types for `x`: str, int, bytes"
```
## Incompatible declarations with bad assignment
```py

View file

@ -0,0 +1,423 @@
# Public types
## Basic
The "public type" of a symbol refers to the type that is inferred in a nested scope for a symbol
defined in an outer enclosing scope. Since it is not generally possible to analyze the full control
flow of a program, we currently make the simplifying assumption that an inner scope (such as the
`inner` function below) could be executed at any position in the enclosing scope. The public type
should therefore be the union of all possible types that the symbol could have.
In the following example, depending on when `inner()` is called, the type of `x` could either be `A`
or `B`:
```py
class A: ...
class B: ...
class C: ...
def outer() -> None:
x = A()
def inner() -> None:
# TODO: We might ideally be able to eliminate `Unknown` from the union here since `x` resolves to an
# outer scope that is a function scope (as opposed to module global scope), and `x` is never declared
# nonlocal in a nested scope that also assigns to it.
reveal_type(x) # revealed: Unknown | A | B
# This call would observe `x` as `A`.
inner()
x = B()
# This call would observe `x` as `B`.
inner()
```
Similarly, if control flow in the outer scope can split, the public type of `x` should reflect that:
```py
def outer(flag: bool) -> None:
x = A()
def inner() -> None:
reveal_type(x) # revealed: Unknown | A | B | C
inner()
if flag:
x = B()
inner()
else:
x = C()
inner()
inner()
```
If a binding is not reachable, it is not considered in the public type:
```py
def outer() -> None:
x = A()
def inner() -> None:
reveal_type(x) # revealed: Unknown | A | C
inner()
if False:
x = B() # this binding of `x` is unreachable
inner()
x = C()
inner()
def outer(flag: bool) -> None:
x = A()
def inner() -> None:
reveal_type(x) # revealed: Unknown | A | C
inner()
if flag:
return
x = B() # this binding of `x` is unreachable
x = C()
inner()
```
If a symbol is only conditionally bound, we do not raise any errors:
```py
def outer(flag: bool) -> None:
if flag:
x = A()
def inner() -> None:
reveal_type(x) # revealed: Unknown | A
inner()
```
In the future, we may try to be smarter about which bindings must or must not be a visible to a
given nested scope, depending where it is defined. In the above case, this shouldn't change the
behavior -- `x` is defined before `inner` in the same branch, so should be considered
definitely-bound for `inner`. But in other cases we may want to emit `possibly-unresolved-reference`
in future:
```py
def outer(flag: bool) -> None:
if flag:
x = A()
def inner() -> None:
# TODO: Ideally, we would emit a possibly-unresolved-reference error here.
reveal_type(x) # revealed: Unknown | A
inner()
```
The public type is available, even if the end of the outer scope is unreachable. This is a
regression test. A previous version of ty used the end-of-scope position to determine the public
type, which would have resulted in incorrect type inference here:
```py
def outer() -> None:
x = A()
def inner() -> None:
reveal_type(x) # revealed: Unknown | A
inner()
return
# unreachable
def outer(flag: bool) -> None:
x = A()
def inner() -> None:
reveal_type(x) # revealed: Unknown | A | B
if flag:
x = B()
inner()
return
# unreachable
inner()
def outer(x: A) -> None:
def inner() -> None:
reveal_type(x) # revealed: A
raise
```
An arbitrary level of nesting is supported:
```py
def f0() -> None:
x = A()
def f1() -> None:
def f2() -> None:
def f3() -> None:
def f4() -> None:
reveal_type(x) # revealed: Unknown | A | B
f4()
f3()
f2()
f1()
x = B()
f1()
```
## At module level
The behavior is the same if the outer scope is the global scope of a module:
```py
def flag() -> bool:
return True
if flag():
x = 1
def f() -> None:
reveal_type(x) # revealed: Unknown | Literal[1, 2]
# Function only used inside this branch
f()
x = 2
# Function only used inside this branch
f()
```
## Mixed declarations and bindings
When a declaration only appears in one branch, we also consider the types of the symbol's bindings
in other branches:
```py
def flag() -> bool:
return True
if flag():
A: str = ""
else:
A = None
reveal_type(A) # revealed: Literal[""] | None
def _():
reveal_type(A) # revealed: str | None
```
This pattern appears frequently with conditional imports. The `import` statement is both a
declaration and a binding, but we still add `None` to the public type union in a situation like
this:
```py
try:
import optional_dependency # ty: ignore
except ImportError:
optional_dependency = None
reveal_type(optional_dependency) # revealed: Unknown | None
def _():
reveal_type(optional_dependency) # revealed: Unknown | None
```
## Limitations
### Type narrowing
We currently do not further analyze control flow, so we do not support cases where the inner scope
is only executed in a branch where the type of `x` is narrowed:
```py
class A: ...
def outer(x: A | None):
if x is not None:
def inner() -> None:
# TODO: should ideally be `A`
reveal_type(x) # revealed: A | None
inner()
```
### Shadowing
Similarly, since we do not analyze control flow in the outer scope here, we assume that `inner()`
could be called between the two assignments to `x`:
```py
def outer() -> None:
def inner() -> None:
# TODO: this should ideally be `Unknown | Literal[1]`, but no other type checker supports this either
reveal_type(x) # revealed: Unknown | None | Literal[1]
x = None
# [additional code here]
x = 1
inner()
```
This is currently even true if the `inner` function is only defined after the second assignment to
`x`:
```py
def outer() -> None:
x = None
# [additional code here]
x = 1
def inner() -> None:
# TODO: this should be `Unknown | Literal[1]`. Mypy and pyright support this.
reveal_type(x) # revealed: Unknown | None | Literal[1]
inner()
```
A similar case derived from an ecosystem example, involving declared types:
```py
class C: ...
def outer(x: C | None):
x = x or C()
reveal_type(x) # revealed: C
def inner() -> None:
# TODO: this should ideally be `C`
reveal_type(x) # revealed: C | None
inner()
```
### Assignments to nonlocal variables
Writes to the outer-scope variable are currently not detected:
```py
def outer() -> None:
x = None
def set_x() -> None:
nonlocal x
x = 1
set_x()
def inner() -> None:
# TODO: this should ideally be `Unknown | None | Literal[1]`. Mypy and pyright support this.
reveal_type(x) # revealed: Unknown | None
inner()
```
## Handling of overloads
### With implementation
Overloads need special treatment, because here, we do not want to consider *all* possible
definitions of `f`. This would otherwise result in a union of all three definitions of `f`:
```py
from typing import overload
@overload
def f(x: int) -> int: ...
@overload
def f(x: str) -> str: ...
def f(x: int | str) -> int | str:
raise NotImplementedError
reveal_type(f) # revealed: Overload[(x: int) -> int, (x: str) -> str]
def _():
reveal_type(f) # revealed: Overload[(x: int) -> int, (x: str) -> str]
```
This also works if there are conflicting declarations:
```py
def flag() -> bool:
return True
if flag():
@overload
def g(x: int) -> int: ...
@overload
def g(x: str) -> str: ...
def g(x: int | str) -> int | str:
return x
else:
g: str = ""
def _():
reveal_type(g) # revealed: (Overload[(x: int) -> int, (x: str) -> str]) | str
# error: [conflicting-declarations]
g = "test"
```
### Without an implementation
Similarly, if there is no implementation, we only consider the last overload definition.
```pyi
from typing import overload
@overload
def f(x: int) -> int: ...
@overload
def f(x: str) -> str: ...
reveal_type(f) # revealed: Overload[(x: int) -> int, (x: str) -> str]
def _():
reveal_type(f) # revealed: Overload[(x: int) -> int, (x: str) -> str]
```
This also works if there are conflicting declarations:
```pyi
def flag() -> bool:
return True
if flag():
@overload
def g(x: int) -> int: ...
@overload
def g(x: str) -> str: ...
else:
g: str
def _():
reveal_type(g) # revealed: (Overload[(x: int) -> int, (x: str) -> str]) | str
```
### Overload only defined in one branch
```py
from typing import overload
def flag() -> bool:
return True
if flag():
@overload
def f(x: int) -> int: ...
@overload
def f(x: str) -> str: ...
def f(x: int | str) -> int | str:
raise NotImplementedError
def _():
reveal_type(f) # revealed: Overload[(x: int) -> int, (x: str) -> str]
```

View file

@ -29,6 +29,8 @@ if flag():
chr: int = 1
def _():
reveal_type(abs) # revealed: Unknown | Literal[1] | (def abs(x: SupportsAbs[_T], /) -> _T)
reveal_type(chr) # revealed: int | (def chr(i: SupportsIndex, /) -> str)
# TODO: Should ideally be `Unknown | Literal[1] | (def abs(x: SupportsAbs[_T], /) -> _T)`
reveal_type(abs) # revealed: Unknown | Literal[1]
# TODO: Should ideally be `int | (def chr(i: SupportsIndex, /) -> str)`
reveal_type(chr) # revealed: int
```

View file

@ -12,7 +12,7 @@ Function definitions are evaluated lazily.
x = 1
def f():
reveal_type(x) # revealed: Unknown | Literal[2]
reveal_type(x) # revealed: Unknown | Literal[1, 2]
x = 2
```
@ -299,7 +299,7 @@ def _():
x = 1
def f():
# revealed: Unknown | Literal[2]
# revealed: Unknown | Literal[1, 2]
[reveal_type(x) for a in range(1)]
x = 2
```
@ -316,7 +316,7 @@ def _():
class A:
def f():
# revealed: Unknown | Literal[2]
# revealed: Unknown | Literal[1, 2]
reveal_type(x)
x = 2
@ -333,7 +333,7 @@ def _():
def f():
def g():
# revealed: Unknown | Literal[2]
# revealed: Unknown | Literal[1, 2]
reveal_type(x)
x = 2
```
@ -351,7 +351,7 @@ def _():
class A:
def f():
# revealed: Unknown | Literal[2]
# revealed: Unknown | Literal[1, 2]
[reveal_type(x) for a in range(1)]
x = 2
@ -389,7 +389,7 @@ x = int
class C:
var: ClassVar[x]
reveal_type(C.var) # revealed: Unknown | str
reveal_type(C.var) # revealed: Unknown | int | str
x = str
```
@ -404,7 +404,8 @@ x = int
class C:
var: ClassVar[x]
reveal_type(C.var) # revealed: str
# TODO: should ideally be `str`, but we currently consider all reachable bindings
reveal_type(C.var) # revealed: int | str
x = str
```

View file

@ -1242,18 +1242,27 @@ def f() -> None:
#### `if True`
`mod.py`:
```py
x: str
if True:
x: int
```
def f() -> None:
reveal_type(x) # revealed: int
`main.py`:
```py
from mod import x
reveal_type(x) # revealed: int
```
#### `if False … else`
`mod.py`:
```py
x: str
@ -1261,13 +1270,20 @@ if False:
pass
else:
x: int
```
def f() -> None:
reveal_type(x) # revealed: int
`main.py`:
```py
from mod import x
reveal_type(x) # revealed: int
```
### Ambiguous
`mod.py`:
```py
def flag() -> bool:
return True
@ -1276,9 +1292,14 @@ x: str
if flag():
x: int
```
def f() -> None:
reveal_type(x) # revealed: str | int
`main.py`:
```py
from mod import x
reveal_type(x) # revealed: str | int
```
## Conditional function definitions
@ -1478,6 +1499,8 @@ if False:
```py
# error: [unresolved-import]
from module import symbol
reveal_type(symbol) # revealed: Unknown
```
#### Always true, bound

View file

@ -575,20 +575,18 @@ def f():
Free references inside of a function body refer to variables defined in the containing scope.
Function bodies are _lazy scopes_: at runtime, these references are not resolved immediately at the
point of the function definition. Instead, they are resolved _at the time of the call_, which means
that their values (and types) can be different for different invocations. For simplicity, we instead
resolve free references _at the end of the containing scope_. That means that in the examples below,
all of the `x` bindings should be visible to the `reveal_type`, regardless of where we place the
`return` statements.
TODO: These currently produce the wrong results, but not because of our terminal statement support.
See [ruff#15777](https://github.com/astral-sh/ruff/issues/15777) for more details.
that their values (and types) can be different for different invocations. For simplicity, we
currently consider _all reachable bindings_ in the containing scope:
```py
def top_level_return(cond1: bool, cond2: bool):
x = 1
def g():
# TODO eliminate Unknown
# TODO We could potentially eliminate `Unknown` from the union here,
# because `x` resolves to an enclosing function-like scope and there
# are no nested `nonlocal` declarations of that symbol that might
# modify it.
reveal_type(x) # revealed: Unknown | Literal[1, 2, 3]
if cond1:
if cond2:
@ -601,8 +599,7 @@ def return_from_if(cond1: bool, cond2: bool):
x = 1
def g():
# TODO: Literal[1, 2, 3]
reveal_type(x) # revealed: Unknown | Literal[1]
reveal_type(x) # revealed: Unknown | Literal[1, 2, 3]
if cond1:
if cond2:
x = 2
@ -614,8 +611,7 @@ def return_from_nested_if(cond1: bool, cond2: bool):
x = 1
def g():
# TODO: Literal[1, 2, 3]
reveal_type(x) # revealed: Unknown | Literal[1, 3]
reveal_type(x) # revealed: Unknown | Literal[1, 2, 3]
if cond1:
if cond2:
x = 2

View file

@ -241,16 +241,16 @@ def f():
### Use of variable in nested function
In the example below, since we use `x` in the `inner` function, we use the "public" type of `x`,
which currently refers to the end-of-scope type of `x`. Since the end of the `outer` scope is
unreachable, we need to make sure that we do not emit an `unresolved-reference` diagnostic:
This is a regression test for a behavior that previously caused problems when the public type still
referred to the end-of-scope, which would result in an unresolved-reference error here since the end
of the scope is unreachable.
```py
def outer():
x = 1
def inner():
reveal_type(x) # revealed: Unknown
reveal_type(x) # revealed: Unknown | Literal[1]
while True:
pass
```