## Summary
Implements proper reachability analysis and — in effect — exhaustiveness
checking for `match` statements. This allows us to check the following
code without any errors (leads to *"can implicitly return `None`"* on
`main`):
```py
from enum import Enum, auto
class Color(Enum):
RED = auto()
GREEN = auto()
BLUE = auto()
def hex(color: Color) -> str:
match color:
case Color.RED:
return "#ff0000"
case Color.GREEN:
return "#00ff00"
case Color.BLUE:
return "#0000ff"
```
Note that code like this already worked fine if there was a
`assert_never(color)` statement in a catch-all case, because we would
then consider that `assert_never` call terminal. But now this also works
without the wildcard case. Adding a member to the enum would still lead
to an error here, if that case would not be handled in `hex`.
What needed to happen to support this is a new way of evaluating match
pattern constraints. Previously, we would simply compare the type of the
subject expression against the patterns. For the last case here, the
subject type would still be `Color` and the value type would be
`Literal[Color.BLUE]`, so we would infer an ambiguous truthiness.
Now, before we compare the subject type against the pattern, we first
generate a union type that corresponds to the set of all values that
would have *definitely been matched* by previous patterns. Then, we
build a "narrowed" subject type by computing `subject_type &
~already_matched_type`, and compare *that* against the pattern type. For
the example here, `already_matched_type = Literal[Color.RED] |
Literal[Color.GREEN]`, and so we have a narrowed subject type of `Color
& ~(Literal[Color.RED] | Literal[Color.GREEN]) = Literal[Color.BLUE]`,
which allows us to infer a reachability of `AlwaysTrue`.
<details>
<summary>A note on negated reachability constraints</summary>
It might seem that we now perform duplicate work, because we also record
*negated* reachability constraints. But that is still important for
cases like the following (and possibly also for more realistic
scenarios):
```py
from typing import Literal
def _(x: int | str):
match x:
case None:
pass # never reachable
case _:
y = 1
y
```
</details>
closes https://github.com/astral-sh/ty/issues/99
## Test Plan
* I verified that this solves all examples from the linked ticket (the
first example needs a PEP 695 type alias, because we don't support
legacy type aliases yet)
* Verified that the ecosystem changes are all because of removed false
positives
* Updated tests
## Summary
I noticed that our type narrowing and reachability analysis was
incorrect for class patterns that are not irrefutable. The test cases
below compare the old and the new behavior:
```py
from dataclasses import dataclass
@dataclass
class Point:
x: int
y: int
class Other: ...
def _(target: Point):
y = 1
match target:
case Point(0, 0):
y = 2
case Point(x=0, y=1):
y = 3
case Point(x=1, y=0):
y = 4
reveal_type(y) # revealed: Literal[1, 2, 3, 4] (previously: Literal[2])
def _(target: Point | Other):
match target:
case Point(0, 0):
reveal_type(target) # revealed: Point
case Point(x=0, y=1):
reveal_type(target) # revealed: Point (previously: Never)
case Point(x=1, y=0):
reveal_type(target) # revealed: Point (previously: Never)
case Other():
reveal_type(target) # revealed: Other (previously: Other & ~Point)
```
## Test Plan
New Markdown test
This is a follow-on to #19410 that further reduces the memory usage of
our reachability constraints. When finishing the building of a use-def
map, we walk through all of the "final" states and mark only those
reachability constraints as "used". We then throw away the interior TDD
nodes of any reachability constraints that weren't marked as used.
(This helps because we build up quite a few intermediate TDD nodes when
constructing complex reachability constraints. These nodes can never be
accessed if they were _only_ used as an intermediate TDD node. The
marking step ensures that we keep any nodes that ended up being referred
to in some accessible use-def map state.)
## Summary
Adds proper type inference for implicit instance attributes that are
declared with a "bare" `Final` and adds `invalid-assignment` diagnostics
for all implicit instance attributes that are declared `Final` or
`Final[…]`.
## Test Plan
New and updated MD tests.
## Ecosystem analysis
```diff
pytest (https://github.com/pytest-dev/pytest)
+ error[invalid-return-type] src/_pytest/fixtures.py:1662:24: Return type does not match returned value: expected `Scope`, found `Scope | (Unknown & ~None & ~((...) -> object) & ~str) | (((str, Config, /) -> Unknown) & ~((...) -> object) & ~str) | (Unknown & ~str)
```
The definition of the `scope` attribute is [here](
5f99385635/src/_pytest/fixtures.py (L1020-L1028)).
Looks like this is a new false positive due to missing `TypeAlias`
support that is surfaced here because we now infer a more precise type
for `FixtureDef._scope`.
Fixes https://github.com/astral-sh/ty/issues/769.
**Updated:** The preferred approach here is to keep the SemanticIndex
simple (`del` of any name marks that name "bound" in the current scope)
and to move complexity to type inference (free variable resolution stops
when it finds a binding, unless that binding is declared `nonlocal`). As
part of this change, free variable resolution will now union the types
it finds as it walks in enclosing scopes. This approach is still
incomplete, because it doesn't consider inner scopes or sibling scopes,
but it improves the common case.
---------
Co-authored-by: Carl Meyer <carl@astral.sh>
## Summary
This PR updates the server to keep track of open files both system and
virtual files.
This is done by updating the project by adding the file in the open file
set in `didOpen` notification and removing it in `didClose`
notification.
This does mean that for workspace diagnostics, ty will only check open
files because the behavior of different diagnostic builder is to first
check `is_file_open` and only add diagnostics for open files. So, this
required updating the `is_file_open` model to be `should_check_file`
model which validates whether the file needs to be checked based on the
`CheckMode`. If the check mode is open files only then it will check
whether the file is open. If it's all files then it'll return `true` by
default.
Closes: astral-sh/ty#619
## Test Plan
### Before
There are two files in the project: `__init__.py` and `diagnostics.py`.
In the video, I'm demonstrating the old behavior where making changes to
the (open) `diagnostics.py` file results in re-parsing the file:
https://github.com/user-attachments/assets/c2ac0ecd-9c77-42af-a924-c3744b146045
### After
Same setup as above.
In the video, I'm demonstrating the new behavior where making changes to
the (open) `diagnostics.py` file doesn't result in re-parting the file:
https://github.com/user-attachments/assets/7b82fe92-f330-44c7-b527-c841c4545f8f
## Summary
Resolves https://github.com/astral-sh/ty/issues/339
Supports having a blank function body inside `if TYPE_CHECKING` block or
in the elif or else of a `if not TYPE_CHECKING` block.
```py
if TYPE_CHECKING:
def foo() -> int: ...
if not TYPE_CHECKING: ...
else:
def bar() -> int: ...
```
## Test Plan
Update `function/return_type.md`
---------
Co-authored-by: Carl Meyer <carl@astral.sh>
## Summary
`ty` does not understand that calls to functions which have been
annotated as having a return type of `Never` / `NoReturn` are terminal.
This PR fixes that, by adding new reachability constraints when call
expressions are seen. If the call expression evaluates to `Never`, the
code following it will be considered to be unreachable. Note that, for
adding these constraints, we only consider call expressions at the
statement level, and that too only inside function scopes. This is
because otherwise, the number of such constraints becomes too high, and
evaluating them later on during type inference results in a major
performance degradation.
Fixes https://github.com/astral-sh/ty/issues/180
## Test Plan
New mdtests.
## Ecosystem changes
This PR removes the following false-positives:
- "Function can implicitly return `None`, which is not assignable to
...".
- "Name `foo` used when possibly not defind" - because the branch in
which it is not defined has a `NoReturn` call, or when `foo` was
imported in a `try`, and the except had a `NoReturn` call.
---------
Co-authored-by: David Peter <mail@david-peter.de>
## Summary
The motivation of `ScopedExpressionId` was that we have an expression
identifier that's local to a scope and, therefore, unlikely to change if
a user makes changes in another scope. A local identifier like this has
the advantage that query results may remain unchanged even if other
parts of the file change, which in turn allows Salsa to short-circuit
dependent queries.
However, I noticed that we aren't using `ScopedExpressionId` in a place
where it's important that the identifier is local. It's main use is
inside `infer` which we always run for the entire file. The one
exception to this is `Unpack` but unpack runs as part of `infer`.
Edit: The above isn't entirely correct. We used ScopedExpressionId in
TypeInference which is a query result. Now using ExpressionNodeKey does
mean that a change to the AST invalidates most if not all TypeInference
results of a single file. Salsa then has to run all dependent queries to
see if they're affected by this change even if the change was local to
another scope.
If this locality proves to be important I suggest that we create two
queries on top of TypeInference: one that returns the expression map
which is mainly used in the linter and type inference and a second that
returns all remaining fields. This should give us a similar optimization
at a much lower cost
I also considered remove `ScopedUseId` but I believe that one is still
useful because using `ExpressionNodeKey` for it instead would mean that
all `UseDefMap` change when a single AST node changes. Whether this is
important is something difficult to assess. I'm simply not familiar
enough with the `UseDefMap`. If the locality doesn't matter for the
`UseDefMap`, then a similar change could be made and `bindings_by_use`
could be changed to an `FxHashMap<UseId, Bindings>` where `UseId` is a
thin wrapper around `NodeKey`.
Closes https://github.com/astral-sh/ty/issues/721
## Summary
Remove a hack in control flow modeling that was treating `return`
statements at the end of function bodies in a special way (basically
considering the state *just before* the `return` statement as the
end-of-scope state). This is not needed anymore now that #18750 has been
merged.
In order to make this work, we now use *all reachable bindings* for
purposes of finding implicit instance attribute assignments as well as
for deferred lookups of symbols. Both would otherwise be affected by
this change:
```py
def C:
def f(self):
self.x = 1 # a reachable binding that is not visible at the end of the scope
return
```
```py
def f():
class X: ... # a reachable binding that is not visible at the end of the scope
x: "X" = X() # deferred use of `X`
return
```
Implicit instance attributes also required another change. We previously
kept track of possibly-unbound instance attributes in some cases, but we
now give up on that completely and always consider *implicit* instance
attributes to be bound if we see a reachable binding in a reachable
method. The previous behavior was somewhat inconsistent anyway because
we also do not consider attributes possibly-unbound in other scenarios:
we do not (and can not) keep track of whether or not methods are called
that define these attributes.
closes https://github.com/astral-sh/ty/issues/711
## Ecosystem analysis
I think this looks very positive!
* We see an unsurprising drop in `possibly-unbound-attribute`
diagnostics (599), mostly for classes that define attributes in `try …
except` blocks, `for` loops, or `if … else: raise …` constructs. There
might obviously also be true positives that got removed, but the vast
majority should be false positives.
* There is also a drop in `possibly-unresolved-reference` /
`unresolved-reference` diagnostics (279+13) from the change to deferred
lookups.
* Some `invalid-type-form` false positives got resolved (13), because we
can now properly look up the names in the annotations.
* There are some new *true* positives in `attrs`, since we understand
the `Attribute` annotation that was previously inferred as `Unknown`
because of a re-assignment after the class definition.
## Test Plan
The existing attributes.md test suite has sufficient coverage here.
## Summary
Temporarily modify `UseDefMapBuilder::reachability` for star imports in
order for new definitions to pick up the right reachability. This was
already working for `UseDefMapBuilder::place_states`, but not for
`UseDefMapBuilder::reachable_definitions`.
closes https://github.com/astral-sh/ty/issues/728
## Test Plan
Regression test
## Summary
Evaluate `TYPE_CHECKING` to `ALWAYS_TRUE` and `not TYPE_CHECKING` to
`ALWAYS_FALSE` during semantic index building. This is a follow-up to
https://github.com/astral-sh/ruff/pull/18998 and is in principle just a
performance optimization. We see some (favorable) ecosystem changes
because we can eliminate definitely-unreachable branches early now and
retain narrowing constraints without solving
https://github.com/astral-sh/ty/issues/690 first.
## Summary
Simplifies literal `True` and `False` conditions to `ALWAYS_TRUE` /
`ALWAYS_FALSE` during semantic index building. This allows us to eagerly
evaluate more constraints, which should help with performance (looks
like there is a tiny 1% improvement in instrumented benchmarks), but
also allows us to eliminate definitely-unreachable branches in
control-flow merging. This can lead to better type inference in some
cases because it allows us to retain narrowing constraints without
solving https://github.com/astral-sh/ty/issues/690 first:
```py
def _(c: int | None):
if c is None:
assert False
reveal_type(c) # int, previously: int | None
```
closes https://github.com/astral-sh/ty/issues/713
## Test Plan
* Regression test for https://github.com/astral-sh/ty/issues/713
* Made sure that all ecosystem diffs trace back to removed false
positives
## Summary
Setting `TY_MEMORY_REPORT=full` will generate and print a memory usage
report to the CLI after a `ty check` run:
```
=======SALSA STRUCTS=======
`Definition` metadata=7.24MB fields=17.38MB count=181062
`Expression` metadata=4.45MB fields=5.94MB count=92804
`member_lookup_with_policy_::interned_arguments` metadata=1.97MB fields=2.25MB count=35176
...
=======SALSA QUERIES=======
`File -> ty_python_semantic::semantic_index::SemanticIndex`
metadata=11.46MB fields=88.86MB count=1638
`Definition -> ty_python_semantic::types::infer::TypeInference`
metadata=24.52MB fields=86.68MB count=146018
`File -> ruff_db::parsed::ParsedModule`
metadata=0.12MB fields=69.06MB count=1642
...
=======SALSA SUMMARY=======
TOTAL MEMORY USAGE: 577.61MB
struct metadata = 29.00MB
struct fields = 35.68MB
memo metadata = 103.87MB
memo fields = 409.06MB
```
Eventually, we should integrate these numbers into CI in some form. The
one limitation currently is that heap allocations in salsa structs (e.g.
interned values) are not tracked, but memoized values should have full
coverage. We may also want a peak memory usage counter (that accounts
for non-salsa memory), but that is relatively simple to profile manually
(e.g. `time -v ty check`) and would require a compile-time option to
avoid runtime overhead.
## Summary
Add type narrowing inside comprehensions:
```py
def _(xs: list[int | None]):
[reveal_type(x) for x in xs if x is not None] # revealed: int
```
closes https://github.com/astral-sh/ty/issues/680
## Test Plan
* New Markdown tests
* Made sure the example from https://github.com/astral-sh/ty/issues/680
now checks without errors
* Made sure that all removed ecosystem diagnostics were actually false
positives
## Summary
This PR closesastral-sh/ty#164.
This PR introduces a basic type narrowing mechanism for
attribute/subscript expressions.
Member accesses, int literal subscripts, string literal subscripts are
supported (same as mypy and pyright).
## Test Plan
New test cases are added to `mdtest/narrow/complex_target.md`.
---------
Co-authored-by: David Peter <mail@david-peter.de>
## Summary
* Completely removes the concept of visibility constraints. Reachability
constraints are now used to model the static visibility of bindings and
declarations. Reachability constraints are *much* easier to reason about
/ work with, since they are applied at the beginning of a branch, and
not applied retroactively. Removing the duplication between visibility
and reachability constraints also leads to major code simplifications
[^1]. For an overview of how the new constraint system works, see the
updated doc comment in `reachability_constraints.rs`.
* Fixes a [control-flow modeling bug
(panic)](https://github.com/astral-sh/ty/issues/365) involving `break`
statements in loops
* Fixes a [bug where](https://github.com/astral-sh/ty/issues/624) where
`elif` branches would have wrong reachability constraints
* Fixes a [bug where](https://github.com/astral-sh/ty/issues/648) code
after infinite loops would not be considered unreachble
* Fixes a panic on the `pywin32` ecosystem project, which we should be
able to move to `good.txt` once this has been merged.
* Removes some false positives in unreachable code because we infer
`Never` more often, due to the fact that reachability constraints now
apply retroactively to *all* active bindings, not just to bindings
inside a branch.
* As one example, this removes the `division-by-zero` diagnostic from
https://github.com/astral-sh/ty/issues/443 because we now infer `Never`
for the divisor.
* Supersedes and includes similar test changes as
https://github.com/astral-sh/ruff/pull/18392
closes https://github.com/astral-sh/ty/issues/365
closes https://github.com/astral-sh/ty/issues/624
closes https://github.com/astral-sh/ty/issues/642
closes https://github.com/astral-sh/ty/issues/648
## Benchmarks
Benchmarks on black, pandas, and sympy showed that this is neither a
performance improvement, nor a regression.
## Test Plan
Regression tests for:
- [x] https://github.com/astral-sh/ty/issues/365
- [x] https://github.com/astral-sh/ty/issues/624
- [x] https://github.com/astral-sh/ty/issues/642
- [x] https://github.com/astral-sh/ty/issues/648
[^1]: I'm afraid this is something that @carljm advocated for since the
beginning, and I'm not sure anymore why we have never seriously tried
this before. So I suggest we do *not* attempt to do a historical deep
dive to find out exactly why this ever became so complicated, and just
enjoy the fact that we eventually arrived here.
---------
Co-authored-by: Carl Meyer <carl@astral.sh>
## Summary
Garbage collect ASTs once we are done checking a given file. Queries
with a cross-file dependency on the AST will reparse the file on demand.
This reduces ty's peak memory usage by ~20-30%.
The primary change of this PR is adding a `node_index` field to every
AST node, that is assigned by the parser. `ParsedModule` can use this to
create a flat index of AST nodes any time the file is parsed (or
reparsed). This allows `AstNodeRef` to simply index into the current
instance of the `ParsedModule`, instead of storing a pointer directly.
The indices are somewhat hackily (using an atomic integer) assigned by
the `parsed_module` query instead of by the parser directly. Assigning
the indices in source-order in the (recursive) parser turns out to be
difficult, and collecting the nodes during semantic indexing is
impossible as `SemanticIndex` does not hold onto a specific
`ParsedModuleRef`, which the pointers in the flat AST are tied to. This
means that we have to do an extra AST traversal to assign and collect
the nodes into a flat index, but the small performance impact (~3% on
cold runs) seems worth it for the memory savings.
Part of https://github.com/astral-sh/ty/issues/214.
## Summary
This PR closes https://github.com/astral-sh/ty/issues/238.
Since `DefinitionState::Deleted` was introduced in #18041, support for
the `del` statement (and deletion of except handler names) is
straightforward.
However, it is difficult to determine whether references to attributes
or subscripts are unresolved after they are deleted. This PR only
invalidates narrowing by assignment if the attribute or subscript is
deleted.
## Test Plan
`mdtest/del.md` is added.
---------
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
## Summary
https://github.com/astral-sh/ty/issues/214 will require a couple
invasive changes that I would like to get merged even before garbage
collection is fully implemented (to avoid rebasing):
- `ParsedModule` can no longer be dereferenced directly. Instead you
need to load a `ParsedModuleRef` to access the AST, which requires a
reference to the salsa database (as it may require re-parsing the AST if
it was collected).
- `AstNodeRef` can only be dereferenced with the `node` method, which
takes a reference to the `ParsedModuleRef`. This allows us to encode the
fact that ASTs do not live as long as the database and may be collected
as soon a given instance of a `ParsedModuleRef` is dropped. There are a
number of places where we currently merge the `'db` and `'ast`
lifetimes, so this requires giving some types/functions two separate
lifetime parameters.
## Summary
This PR partially solves https://github.com/astral-sh/ty/issues/164
(derived from #17643).
Currently, the definitions we manage are limited to those for simple
name (symbol) targets, but we expand this to track definitions for
attribute and subscript targets as well.
This was originally planned as part of the work in #17643, but the
changes are significant, so I made it a separate PR.
After merging this PR, I will reflect this changes in #17643.
There is still some incomplete work remaining, but the basic features
have been implemented, so I am publishing it as a draft PR.
Here is the TODO list (there may be more to come):
* [x] Complete rewrite and refactoring of documentation (removing
`Symbol` and replacing it with `Place`)
* [x] More thorough testing
* [x] Consolidation of duplicated code (maybe we can consolidate the
handling related to name, attribute, and subscript)
This PR replaces the current `Symbol` API with the `Place` API, which is
a concept that includes attributes and subscripts (the term is borrowed
from Rust).
## Test Plan
`mdtest/narrow/assignment.md` is added.
---------
Co-authored-by: David Peter <sharkdp@users.noreply.github.com>
Co-authored-by: Carl Meyer <carl@astral.sh>
## Summary
just a minor nit followup to
https://github.com/astral-sh/ruff/pull/18010 -- put all the
non-`Visitor` methods of `SemanticIndexBuilder` in the same impl block
rather than having multiple impl blocks
## Test Plan
`cargo build`
## Summary
With this PR we now detect that x is always defined in `use`:
```py
if flag and (x := number):
use(x)
```
When outside if, it's still detected as possibly not defined
```py
flag and (x := number)
# error: [possibly-unresolved-reference]
use(x)
```
In order to achieve that, I had to find a way to get access to the
flow-snapshots of the boolean expression when analyzing the flow of the
if statement. I did it by special casing the visitor of boolean
expression to return flow control information, exporting two snapshots -
`maybe_short_circuit` and `no_short_circuit`. When indexing
boolean expression itself we must assume all possible flows, but when
it's inside if statement, we can be smarter than that.
## Test Plan
Fixed existing and added new mdtests.
I went through some of mypy primer results and they look fine
---------
Co-authored-by: Carl Meyer <carl@astral.sh>
Summary
--
This PR resolves both the typing-related and syntax error TODOs added in
#17563 by tracking a set of `global` bindings for each scope. As
discussed below, we avoid the additional AST traversal from ruff by
collecting `Name`s from `global` statements while building the semantic
index and emit a syntax error if the `Name` is already bound in the
current scope at the point of the `global` statement. This has the
downside of separating the error from the `SemanticSyntaxChecker`, but I
plan to explore using this approach in the `SemanticSyntaxChecker`
itself as a follow-up. It seems like this may be a better approach for
ruff as well.
Test Plan
--
Updated all of the related mdtests to remove the TODOs (and add quotes I
forgot on the messages).
There is one remaining TODO, but it requires `nonlocal` support, which
isn't even incorporated into the `SemanticSyntaxChecker` yet.
---------
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Co-authored-by: Carl Meyer <carl@astral.sh>
## Summary
This PR fixes#17595.
## Test Plan
New test cases are added to `mdtest/narrow/conditionals/nested.md`.
---------
Co-authored-by: Carl Meyer <carl@astral.sh>