## Summary
Part of astral-sh/ty#1341
The following changes will be made to `Place`.
* Introduce `TypeOrigin`
* `Place::Type` -> `Place::Defined`
* `Place::Unbound` -> `Place::Undefined`
* `Boundness` -> `Definedness`
`TypeOrigin::Declared`+`Definedness::PossiblyUndefined` are patterns
that weren't considered before, but this PR doesn't address them yet,
only refactors.
## Test Plan
Refactoring
## Summary
Rename "unwrapping" methods on `Type` from e.g.
`Type::into_class_literal` to `Type::as_class_literal`. I personally
find that name more intuitive, since no transformation of any kind is
happening. We are just unwrapping from certain enum variants. An
alternative would be `try_as_class_literal`, which would follow the
[`strum` naming
scheme](https://docs.rs/strum/latest/strum/derive.EnumTryAs.html), but
is slightly longer.
Also rename `Type::into_callable` to `Type::try_upcast_to_callable`.
Note that I intentionally kept names like
`FunctionType::into_callable_type`, because those return `CallableType`,
not `Option<Type<…>>`.
## Test Plan
Pure refactoring
Previously, `Type::object` would find the definition of the `object`
class in typeshed, load that in (to produce a `ClassLiteral` and
`ClassType`), and then create a `NominalInstance` of that class.
It's possible that we are using a typeshed that doesn't define `object`.
We will not be able to do much useful work with that kind of typeshed,
but it's still a possibility that we have to support at least without
panicking. Previously, we would handle this situation by falling back on
`Unknown`.
In most cases, that's a perfectly fine fallback! But `object` is also
our top type — the type of all values. `Unknown` is _not_ an acceptable
stand-in for the top type.
This PR adds a new `NominalInstance` variant for "instances of
`object`". Unlike other nominal instances, we do not need to load in
`object`'s `ClassType` to instantiate this variant. We will use this new
variant even when the current typeshed does not define an `object`
class, ensuring that we have a fully static representation of our top
type at all times.
There are several operations that need access to a nominal instance's
class, and for this new `object` variant we load it lazily only when
it's needed. That means this operation is now fallible, since this is
where the "typeshed doesn't define `object`" failure shows up.
This new approach also has the benefit of avoiding some salsa cycles
that were cropping up while I was debugging #20093, since the new
constraint set representation was trying to instantiate `Type::object`
while in the middle of processing its definition in typeshed. Cycle
handling was kicking in correctly and returning the `Unknown` fallback
mentioned above. But the constraint set implementation depends on
`Type::object` being a distinct and fully static type, highlighting that
this is a correctness fix, not just an optimization fix.
---------
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
## Summary
With this PR, we stop performing boundness analysis for implicit
instance attributes:
```py
class C:
def __init__(self):
if False:
self.x = 1
C().x # would previously show an error, with this PR we pretend the attribute exists
```
This PR is potentially just a temporary measure until we find a better
fix. But I have already invested a lot of time trying to find the root
cause of https://github.com/astral-sh/ty/issues/758 (and [this
example](https://github.com/astral-sh/ty/issues/758#issuecomment-3206108262),
which I'm not entirely sure is related) and I still don't understand
what is going on. This PR fixes the performance problems in both of
these problems (in a rather crude way).
The impact of the proposed change on the ecosystem is small, and the
three new diagnostics are arguably true positives (previously hidden
because we considered the code unreachable, based on e.g. `assert`ions
that depended on implicit instance attributes). So this seems like a
reasonable fix for now.
Note that we still support cases like these:
```py
class D:
if False: # or any other expression that statically evaluates to `False`
x: int = 1
D().x # still an error
class E:
if False: # or any other expression that statically evaluates to `False`
def f(self):
self.x = 1
E().x # still an error
```
closes https://github.com/astral-sh/ty/issues/758
## Test Plan
Updated tests, benchmark results
## Summary
closes https://github.com/astral-sh/ty/issues/692
If the expression (or any child expressions) is not definitely bound the
reachability constraint evaluation is determined as ambiguous.
This fixes the infinite cycles panic in the following code:
```py
from typing import Literal
class Toggle:
def __init__(self: "Toggle"):
if not self.x:
self.x: Literal[True] = True
```
Credit of this solution is for David.
## Test Plan
- Added a test case with too many cycle iterations panic.
- Previous tests.
---------
Co-authored-by: David Peter <mail@david-peter.de>
## Summary
While looking at some logging output that I added to
`ReachabilityConstraintBuilder::add_and_constraint` in order to debug
https://github.com/astral-sh/ty/issues/1091, I noticed that it seemed to
suggest that the TDD was built in an imbalanced way for code like the
following, where we have a sequence of non-nested `if` conditions:
```py
def f(t1, t2, t3, t4, …):
x = 0
if t1:
x = 1
if t2:
x = 2
if t3:
x = 3
if t4:
x = 4
…
```
To understand this a bit better, I added some code to the
`ReachabilityConstraintBuilder` to render the resulting TDD. On `main`,
we get a tree that looks like the following, where you can see a pattern
of N sub-trees that grow linearly with N (number of `if` statements).
This results in an overall tree structure that has N² nodes (see graph
below):
<img alt="normal order"
src="https://github.com/user-attachments/assets/aab40ce9-e82a-4fcd-823a-811f05f15f66"
/>
If we zoom in to one of these subgraphs, we can see what the problem is.
When we add new constraints that represent combinations like `t1 AND ~t2
AND ~t3 AND t4 AND …`, they start with the evaluation of "early"
conditions (`t1`, `t2`, …). This means that we have to create new
subgraphs for each new `if` condition because there is little sharing
with the previous structure. We evaluate the Boolean condition in a
right-associative way: `t1 AND (~t2 AND (~t3 AND t4)))`:
<img width="500" align="center"
src="https://github.com/user-attachments/assets/31ea7182-9e00-4975-83df-d980464f545d"
/>
If we change the ordering of TDD atoms, we can change that to a
left-associative evaluation: `(((t1 AND ~t2) AND ~t3) AND t4) …`. This
means that we can re-use previous subgraphs `(t1 AND ~t2)`, which
results in a much more compact graph structure overall (note how "late"
conditions are now at the top, and "early" conditions are further down
in the graph):
<img alt="reverse order"
src="https://github.com/user-attachments/assets/96a6b7c1-3d35-4192-a917-0b2d24c6b144"
/>
If we count the number of TDD nodes for a growing number if `if`
statements, we can see that this change results in a slower growth. It's
worth noting that the growth is still superlinear, though:
<img width="800" height="600" alt="plot"
src="https://github.com/user-attachments/assets/22e8394f-e74e-4a9e-9687-0d41f94f2303"
/>
On the actual code from the referenced ticket (the `t_main.py` file
reduced to its main function, with the main function limited to 2000
lines instead of 11000 to allow the version on `main` to run to
completion), the effect is much more dramatic. Instead of 26 million TDD
nodes (`main`), we now only create 250 thousand (this branch), which is
slightly less than 1%.
The change in this PR allows us to build the semantic index and
type-check the problematic `t_main.py` file in
https://github.com/astral-sh/ty/issues/1091 in 9 seconds. This is still
not great, but an obvious improvement compared to running out of memory
after *minutes* of execution.
An open question remains whether this change is beneficial for all kinds
of code patterns, or just this linear sequence of `if` statements. It
does not seem unreasonable to think that referring to "earlier"
conditions is generally a good idea, but I learned from Doug that it's
generally not possible to find a TDD-construction heuristic that is
non-pathological for all kinds of inputs. Fortunately, it seems like
this change here results in performance improvements across *all of our
benchmarks*, which should increase the confidence in this change:
| Benchmark | Improvement |
|---------------------|-------------------------|
| hydra-zen | +13% |
| DateType | +5% |
| sympy (walltime) | +4% |
| attrs | +4% |
| pydantic (walltime) | +2% |
| pandas (walltime) | +2% |
| altair (walltime) | +2% |
| static-frame | +2% |
| anyio | +1% |
| freqtrade | +1% |
| colour-science | +1% |
| tanjun | +1% |
closes https://github.com/astral-sh/ty/issues/1091
---------
Co-authored-by: Douglas Creager <dcreager@dcreager.net>
## Summary
Support `as` patterns in reachability analysis:
```py
from typing import assert_never
def f(subject: str | int):
match subject:
case int() as x:
pass
case str():
pass
case _:
assert_never(subject) # would previously emit an error
```
Note that we still don't support inferring correct types for the bound
name (`x`).
Closes https://github.com/astral-sh/ty/issues/928
## Test Plan
New Markdown tests
## Summary
Implements proper reachability analysis and — in effect — exhaustiveness
checking for `match` statements. This allows us to check the following
code without any errors (leads to *"can implicitly return `None`"* on
`main`):
```py
from enum import Enum, auto
class Color(Enum):
RED = auto()
GREEN = auto()
BLUE = auto()
def hex(color: Color) -> str:
match color:
case Color.RED:
return "#ff0000"
case Color.GREEN:
return "#00ff00"
case Color.BLUE:
return "#0000ff"
```
Note that code like this already worked fine if there was a
`assert_never(color)` statement in a catch-all case, because we would
then consider that `assert_never` call terminal. But now this also works
without the wildcard case. Adding a member to the enum would still lead
to an error here, if that case would not be handled in `hex`.
What needed to happen to support this is a new way of evaluating match
pattern constraints. Previously, we would simply compare the type of the
subject expression against the patterns. For the last case here, the
subject type would still be `Color` and the value type would be
`Literal[Color.BLUE]`, so we would infer an ambiguous truthiness.
Now, before we compare the subject type against the pattern, we first
generate a union type that corresponds to the set of all values that
would have *definitely been matched* by previous patterns. Then, we
build a "narrowed" subject type by computing `subject_type &
~already_matched_type`, and compare *that* against the pattern type. For
the example here, `already_matched_type = Literal[Color.RED] |
Literal[Color.GREEN]`, and so we have a narrowed subject type of `Color
& ~(Literal[Color.RED] | Literal[Color.GREEN]) = Literal[Color.BLUE]`,
which allows us to infer a reachability of `AlwaysTrue`.
<details>
<summary>A note on negated reachability constraints</summary>
It might seem that we now perform duplicate work, because we also record
*negated* reachability constraints. But that is still important for
cases like the following (and possibly also for more realistic
scenarios):
```py
from typing import Literal
def _(x: int | str):
match x:
case None:
pass # never reachable
case _:
y = 1
y
```
</details>
closes https://github.com/astral-sh/ty/issues/99
## Test Plan
* I verified that this solves all examples from the linked ticket (the
first example needs a PEP 695 type alias, because we don't support
legacy type aliases yet)
* Verified that the ecosystem changes are all because of removed false
positives
* Updated tests
## Summary
I noticed that our type narrowing and reachability analysis was
incorrect for class patterns that are not irrefutable. The test cases
below compare the old and the new behavior:
```py
from dataclasses import dataclass
@dataclass
class Point:
x: int
y: int
class Other: ...
def _(target: Point):
y = 1
match target:
case Point(0, 0):
y = 2
case Point(x=0, y=1):
y = 3
case Point(x=1, y=0):
y = 4
reveal_type(y) # revealed: Literal[1, 2, 3, 4] (previously: Literal[2])
def _(target: Point | Other):
match target:
case Point(0, 0):
reveal_type(target) # revealed: Point
case Point(x=0, y=1):
reveal_type(target) # revealed: Point (previously: Never)
case Point(x=1, y=0):
reveal_type(target) # revealed: Point (previously: Never)
case Other():
reveal_type(target) # revealed: Other (previously: Other & ~Point)
```
## Test Plan
New Markdown test
This is a follow-on to #19410 that further reduces the memory usage of
our reachability constraints. When finishing the building of a use-def
map, we walk through all of the "final" states and mark only those
reachability constraints as "used". We then throw away the interior TDD
nodes of any reachability constraints that weren't marked as used.
(This helps because we build up quite a few intermediate TDD nodes when
constructing complex reachability constraints. These nodes can never be
accessed if they were _only_ used as an intermediate TDD node. The
marking step ensures that we keep any nodes that ended up being referred
to in some accessible use-def map state.)
## Summary
`ty` does not understand that calls to functions which have been
annotated as having a return type of `Never` / `NoReturn` are terminal.
This PR fixes that, by adding new reachability constraints when call
expressions are seen. If the call expression evaluates to `Never`, the
code following it will be considered to be unreachable. Note that, for
adding these constraints, we only consider call expressions at the
statement level, and that too only inside function scopes. This is
because otherwise, the number of such constraints becomes too high, and
evaluating them later on during type inference results in a major
performance degradation.
Fixes https://github.com/astral-sh/ty/issues/180
## Test Plan
New mdtests.
## Ecosystem changes
This PR removes the following false-positives:
- "Function can implicitly return `None`, which is not assignable to
...".
- "Name `foo` used when possibly not defind" - because the branch in
which it is not defined has a `NoReturn` call, or when `foo` was
imported in a `try`, and the except had a `NoReturn` call.
---------
Co-authored-by: David Peter <mail@david-peter.de>
## Summary
Simplifies literal `True` and `False` conditions to `ALWAYS_TRUE` /
`ALWAYS_FALSE` during semantic index building. This allows us to eagerly
evaluate more constraints, which should help with performance (looks
like there is a tiny 1% improvement in instrumented benchmarks), but
also allows us to eliminate definitely-unreachable branches in
control-flow merging. This can lead to better type inference in some
cases because it allows us to retain narrowing constraints without
solving https://github.com/astral-sh/ty/issues/690 first:
```py
def _(c: int | None):
if c is None:
assert False
reveal_type(c) # int, previously: int | None
```
closes https://github.com/astral-sh/ty/issues/713
## Test Plan
* Regression test for https://github.com/astral-sh/ty/issues/713
* Made sure that all ecosystem diffs trace back to removed false
positives
## Summary
Setting `TY_MEMORY_REPORT=full` will generate and print a memory usage
report to the CLI after a `ty check` run:
```
=======SALSA STRUCTS=======
`Definition` metadata=7.24MB fields=17.38MB count=181062
`Expression` metadata=4.45MB fields=5.94MB count=92804
`member_lookup_with_policy_::interned_arguments` metadata=1.97MB fields=2.25MB count=35176
...
=======SALSA QUERIES=======
`File -> ty_python_semantic::semantic_index::SemanticIndex`
metadata=11.46MB fields=88.86MB count=1638
`Definition -> ty_python_semantic::types::infer::TypeInference`
metadata=24.52MB fields=86.68MB count=146018
`File -> ruff_db::parsed::ParsedModule`
metadata=0.12MB fields=69.06MB count=1642
...
=======SALSA SUMMARY=======
TOTAL MEMORY USAGE: 577.61MB
struct metadata = 29.00MB
struct fields = 35.68MB
memo metadata = 103.87MB
memo fields = 409.06MB
```
Eventually, we should integrate these numbers into CI in some form. The
one limitation currently is that heap allocations in salsa structs (e.g.
interned values) are not tracked, but memoized values should have full
coverage. We may also want a peak memory usage counter (that accounts
for non-salsa memory), but that is relatively simple to profile manually
(e.g. `time -v ty check`) and would require a compile-time option to
avoid runtime overhead.
## Summary
* Completely removes the concept of visibility constraints. Reachability
constraints are now used to model the static visibility of bindings and
declarations. Reachability constraints are *much* easier to reason about
/ work with, since they are applied at the beginning of a branch, and
not applied retroactively. Removing the duplication between visibility
and reachability constraints also leads to major code simplifications
[^1]. For an overview of how the new constraint system works, see the
updated doc comment in `reachability_constraints.rs`.
* Fixes a [control-flow modeling bug
(panic)](https://github.com/astral-sh/ty/issues/365) involving `break`
statements in loops
* Fixes a [bug where](https://github.com/astral-sh/ty/issues/624) where
`elif` branches would have wrong reachability constraints
* Fixes a [bug where](https://github.com/astral-sh/ty/issues/648) code
after infinite loops would not be considered unreachble
* Fixes a panic on the `pywin32` ecosystem project, which we should be
able to move to `good.txt` once this has been merged.
* Removes some false positives in unreachable code because we infer
`Never` more often, due to the fact that reachability constraints now
apply retroactively to *all* active bindings, not just to bindings
inside a branch.
* As one example, this removes the `division-by-zero` diagnostic from
https://github.com/astral-sh/ty/issues/443 because we now infer `Never`
for the divisor.
* Supersedes and includes similar test changes as
https://github.com/astral-sh/ruff/pull/18392
closes https://github.com/astral-sh/ty/issues/365
closes https://github.com/astral-sh/ty/issues/624
closes https://github.com/astral-sh/ty/issues/642
closes https://github.com/astral-sh/ty/issues/648
## Benchmarks
Benchmarks on black, pandas, and sympy showed that this is neither a
performance improvement, nor a regression.
## Test Plan
Regression tests for:
- [x] https://github.com/astral-sh/ty/issues/365
- [x] https://github.com/astral-sh/ty/issues/624
- [x] https://github.com/astral-sh/ty/issues/642
- [x] https://github.com/astral-sh/ty/issues/648
[^1]: I'm afraid this is something that @carljm advocated for since the
beginning, and I'm not sure anymore why we have never seriously tried
this before. So I suggest we do *not* attempt to do a historical deep
dive to find out exactly why this ever became so complicated, and just
enjoy the fact that we eventually arrived here.
---------
Co-authored-by: Carl Meyer <carl@astral.sh>
2025-06-17 09:24:28 +02:00
Renamed from crates/ty_python_semantic/src/semantic_index/visibility_constraints.rs (Browse further)