This PR adds another useful simplification when rendering constraint
sets: `T = int` instead of `T = int ∧ T ≠ str`. (The "smaller"
constraint `T = int` implies the "larger" constraint `T ≠ str`.
Constraint set clauses are intersections, and if one constraint in a
clause implies another, we can throw away the "larger" constraint.)
While we're here, we also normalize the bounds of a constraint, so that
we equate e.g. `T ≤ int | str` with `T ≤ str | int`, and change the
ordering of BDD variables so that all constraints with the same typevar
are ordered adjacent to each other.
Lastly, we also add a new `display_graph` helper method that prints out
the full graph structure of a BDD.
---------
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
## Summary
This allows us to handle self-referential bounds/constraints/defaults
without panicking.
Handles more cases from https://github.com/astral-sh/ty/issues/256
This also changes the way we infer the types of legacy TypeVars. Rather
than understanding a constructor call to `typing[_extension].TypeVar`
inside of any (arbitrarily nested) expression, and having to use a
special `assigned_to` field of the semantic index to try to best-effort
figure out what name the typevar was assigned to, we instead understand
the creation of a legacy `TypeVar` only in the supported syntactic
position (RHS of a simple un-annotated assignment with one target). In
any other position, we just infer it as creating an opaque instance of
`typing.TypeVar`. (This behavior matches all other type checkers.)
So we now special-case TypeVar creation in `TypeInferenceBuilder`, as a
special case of an assignment definition, rather than deeper inside call
binding. This does mean we re-implement slightly more of
argument-parsing, but in practice this is minimal and easy to handle
correctly.
This is easier to implement if we also make the RHS of a simple (no
unpacking) one-target assignment statement no longer a standalone
expression. Which is fine to do, because simple one-target assignments
don't need to infer the RHS more than once. This is a bonus performance
(0-3% across various projects) and significant memory-usage win, since
most assignment statements are simple one-target assignment statements,
meaning we now create many fewer standalone-expression salsa
ingredients.
This change does mean that inference of manually-constructed
`TypeAliasType` instances can no longer find its Definition in
`assigned_to`, which regresses go-to-definition for these aliases. In a
future PR, `TypeAliasType` will receive the same treatment that
`TypeVar` did in this PR (moving its special-case inference into
`TypeInferenceBuilder` and supporting it only in the correct syntactic
position, and lazily inferring its value type to support recursion),
which will also fix the go-to-definition regression. (I decided a
temporary edge-case regression is better in this case than doubling the
size of this PR.)
This PR also tightens up and fixes various aspects of the validation of
`TypeVar` creation, as seen in the tests.
We still (for now) treat all typevars as instances of `typing.TypeVar`,
even if they were created using `typing_extensions.TypeVar`. This means
we'll wrongly error on e.g. `T.__default__` on Python 3.11, even if `T`
is a `typing_extensions.TypeVar` instance at runtime. We share this
wrong behavior with both mypy and pyrefly. It will be easier to fix
after we pull in https://github.com/python/typeshed/pull/14840.
There are some issues that showed up here with typevar identity and
`MarkTypeVarsInferable`; the fix here (using the new `original` field
and `is_identical_to` methods on `BoundTypeVarInstance` and
`TypeVarInstance`) is a bit kludgy, but it can go away when we eliminate
`MarkTypeVarsInferable`.
## Test Plan
Added and updated mdtests.
### Conformance suite impact
The impact here is all positive:
* We now correctly error on a legacy TypeVar with exactly one constraint
type given.
* We now correctly error on a legacy TypeVar with both an upper bound
and constraints specified.
### Ecosystem impact
Basically none; in the setuptools case we just issue slightly different
errors on an invalid TypeVar definition, due to the modified validation
code.
---------
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
## Summary
An annotated assignment `name: annotation` without a right-hand side was
previously not covered by the range returned from
`DefinitionKind::full_range`, because we did expand the range to include
the right-hand side (if there was one), but failed to include the
annotation.
## Test Plan
Updated snapshot tests
## Summary
For PEP 695 generic functions and classes, there is an extra "type
params scope" (a child of the outer scope, and wrapping the body scope)
in which the type parameters are defined; class bases and function
parameter/return annotations are resolved in that type-params scope.
This PR fixes some longstanding bugs in how we resolve name loads from
inside these PEP 695 type parameter scopes, and also defers type
inference of PEP 695 typevar bounds/constraints/default, so we can
handle cycles without panicking.
We were previously treating these type-param scopes as lazy nested
scopes, which is wrong. In fact they are eager nested scopes; the class
`C` here inherits `int`, not `str`, and previously we got that wrong:
```py
Base = int
class C[T](Base): ...
Base = str
```
But certain syntactic positions within type param scopes (typevar
bounds/constraints/defaults) are lazy at runtime, and we should use
deferred name resolution for them. This also means they can have cycles;
in order to handle that without panicking in type inference, we need to
actually defer their type inference until after we have constructed the
`TypeVarInstance`.
PEP 695 does specify that typevar bounds and constraints cannot be
generic, and that typevar defaults can only reference prior typevars,
not later ones. This reduces the scope of (valid from the type-system
perspective) cycles somewhat, although cycles are still possible (e.g.
`class C[T: list[C]]`). And this is a type-system-only restriction; from
the runtime perspective an "invalid" case like `class C[T: T]` actually
works fine.
I debated whether to implement the PEP 695 restrictions as a way to
avoid some cycles up-front, but I ended up deciding against that; I'd
rather model the runtime name-resolution semantics accurately, and
implement the PEP 695 restrictions as a separate diagnostic on top.
(This PR doesn't yet implement those diagnostics, thus some `# TODO:
error` in the added tests.)
Introducing the possibility of cyclic typevars made typevar display
potentially stack overflow. For now I've handled this by simply removing
typevar details (bounds/constraints/default) from typevar display. This
impacts display of two kinds of types. If you `reveal_type(T)` on an
unbound `T` you now get just `typing.TypeVar` instead of
`typing.TypeVar("T", ...)` where `...` is the bound/constraints/default.
This matches pyright and mypy; pyrefly uses `type[TypeVar[T]]` which
seems a bit confusing, but does include the name. (We could easily
include the name without cycle issues, if there's a syntax we like for
that.)
It also means that displaying a generic function type like `def f[T:
int](x: T) -> T: ...` now displays as `f[T](x: T) -> T` instead of `f[T:
int](x: T) -> T`. This matches pyright and pyrefly; mypy does include
bound/constraints/defaults of typevars in function/callable type
display. If we wanted to add this, we would either need to thread a
visitor through all the type display code, or add a `decycle` type
transformation that replaced recursive reoccurrence of a type with a
marker.
## Test Plan
Added mdtests and modified existing tests to improve their correctness.
After this PR, there's only a single remaining py-fuzzer seed in the
0-500 range that panics! (Before this PR, there were 10; the fuzzer
likes to generate cyclic PEP 695 syntax.)
## Ecosystem report
It's all just the changes to `TypeVar` display.
This also reintroduces the `ResolvedDefinition::Module` variant because
reverse-engineering it in several places is a bit confusing. In an ideal
world we wouldn't have `ResolvedDefinition::FileWithRange` as it kinda
kills the ability to do richer analysis, so I want to chip away at its
scope wherever I can (currently it's used to point at asname parts of
import statements when doing `ImportAliasResolution::PreserveAliases`,
and also keyword arguments).
This also makes a kind of odd change to allow a hover to *only* produce
a docstring. This works around an oddity where hovering over a module
name in an import fails to resolve to a `ty` even though hovering over
uses of that imported name *does*.
The two fixed tests reflect the two interesting cases here.
This PR introduces a few related changes:
- We now keep track of each time a legacy typevar is bound in a
different generic context (e.g. class, function), and internally create
a new `TypeVarInstance` for each usage. This means the rest of the code
can now assume that salsa-equivalent `TypeVarInstance`s refer to the
same typevar, even taking into account that legacy typevars can be used
more than once.
- We also go ahead and track the binding context of PEP 695 typevars.
That's _much_ easier to track since we have the binding context right
there during type inference.
- With that in place, we can now include the name of the binding context
when rendering typevars (e.g. `T@f` instead of `T`)
This PR implements "go to definition" and "go to declaration"
functionality for name nodes only. Future PRs will add support for
attributes, module names in import statements, keyword argument names,
etc.
This PR:
* Registers a declaration and definition request handler for the
language server.
* Splits out the `goto_type_definition` into its own module. The `goto`
module contains functionality that is common to `goto_type_definition`,
`goto_declaration` and `goto_definition`.
* Roughs in a new module `stub_mapping` that is not yet implemented. It
will be responsible for mapping a definition in a stub file to its
corresponding definition(s) in an implementation (source) file.
* Adds a new IDE support function `definitions_for_name` that collects
all of the definitions associated with a name and resolves any imports
(recursively) to find the original definitions associated with that
name.
* Adds a new `VisibleAncestorsIter` stuct that iterates up the scope
hierarchy but skips scopes that are not visible to starting scope.
---------
Co-authored-by: UnboundVariable <unbound@gmail.com>
This PR includes:
* Implemented core signature help logic
* Added new docstring method on Definition that returns a docstring for
function and class definitions
* Modified the display code for Signature that allows a signature string
to be broken into text ranges that correspond to each parameter in the
signature
* Augmented Signature struct so it can track the Definition for a
signature when available; this allows us to find the docstring
associated with the signature
* Added utility functions for parsing parameter documentation from three
popular docstring formats (Google, NumPy and reST)
* Implemented tests for all of the above
"Signature help" is displayed by an editor when you are typing a
function call expression. It is typically triggered when you type an
open parenthesis. The language server provides information about the
target function's signature (or multiple signatures), documentation, and
parameters.
Here is how this appears:

---------
Co-authored-by: UnboundVariable <unbound@gmail.com>
Co-authored-by: Micha Reiser <micha@reiser.io>
## Summary
Emit a diagnostic when a `Final`-qualified symbol is modified. This
first iteration only works for name targets. Tests with TODO comments
were added for attribute assignments as well.
related ticket: https://github.com/astral-sh/ty/issues/158
## Ecosystem impact
Correctly identified [modification of a `Final`
symbol](7b4164a5f2/sphinx/__init__.py (L44))
(behind a `# type: ignore`):
```diff
- warning[unused-ignore-comment] sphinx/__init__.py:44:56: Unused blanket `type: ignore` directive
```
And the same
[here](5471a37e82/src/trio/_core/_run.py (L128)):
```diff
- warning[unused-ignore-comment] src/trio/_core/_run.py:128:45: Unused blanket `type: ignore` directive
```
## Test Plan
New Markdown tests
## Summary
Setting `TY_MEMORY_REPORT=full` will generate and print a memory usage
report to the CLI after a `ty check` run:
```
=======SALSA STRUCTS=======
`Definition` metadata=7.24MB fields=17.38MB count=181062
`Expression` metadata=4.45MB fields=5.94MB count=92804
`member_lookup_with_policy_::interned_arguments` metadata=1.97MB fields=2.25MB count=35176
...
=======SALSA QUERIES=======
`File -> ty_python_semantic::semantic_index::SemanticIndex`
metadata=11.46MB fields=88.86MB count=1638
`Definition -> ty_python_semantic::types::infer::TypeInference`
metadata=24.52MB fields=86.68MB count=146018
`File -> ruff_db::parsed::ParsedModule`
metadata=0.12MB fields=69.06MB count=1642
...
=======SALSA SUMMARY=======
TOTAL MEMORY USAGE: 577.61MB
struct metadata = 29.00MB
struct fields = 35.68MB
memo metadata = 103.87MB
memo fields = 409.06MB
```
Eventually, we should integrate these numbers into CI in some form. The
one limitation currently is that heap allocations in salsa structs (e.g.
interned values) are not tracked, but memoized values should have full
coverage. We may also want a peak memory usage counter (that accounts
for non-salsa memory), but that is relatively simple to profile manually
(e.g. `time -v ty check`) and would require a compile-time option to
avoid runtime overhead.
## Summary
Consider the following example, which leads to a excessively large
runtime on `main`. The reason for this is the following. When inferring
types for `self.a`, we look up the `a` attribute on `C`. While looking
for implicit instance attributes, we go through every method and check
for `self.a = …` assignments. There are no such assignments here, but we
always have an implicit `self.a = <unbound>` binding at the beginning
over every method. This binding accumulates a complex visibility
constraint in `C.f`, due to the `isinstance` checks. While evaluating
that constraint, we need to infer the type of `self.b`. There's no
binding for `self.b` either, but there's also an implicit `self.b =
<unbound>` binding with the same complex visibility constraint
(involving `self.b` recursively). This leads to a combinatorial
explosion:
```py
class C:
def f(self: "C"):
if isinstance(self.a, str):
return
if isinstance(self.b, str):
return
if isinstance(self.b, str):
return
if isinstance(self.b, str):
return
# repeat 20 times
```
(note that the `self` parameter here is annotated explicitly because we
currently still infer `Unknown` for `self` otherwise)
The fix proposed here is rather simple: when there are no `self.name =
…` attribute assignments in a given method, we skip evaluating the
visibility constraint of the implicit `self.name = <unbound>` binding.
This should also generally help with performance, because that's a very
common case.
This is *not* a fix for cases where there *are* actual bindings in the
method. When we add `self.a = 1; self.b = 1` to that example above, we
still see that combinatorial explosion of runtime. I still think it's
worth to make this optimization, as it fixes the problems with `pandas`
and `sqlalchemy` reported by users. I will open a ticket to track that
separately.
closes https://github.com/astral-sh/ty/issues/627
closes https://github.com/astral-sh/ty/issues/641
## Test Plan
* Made sure that `ty` finishes quickly on the MREs in
https://github.com/astral-sh/ty/issues/627
* Made sure that `ty` finishes quickly on `pandas`
* Made sure that `ty` finishes quickly on `sqlalchemy`
## Summary
Garbage collect ASTs once we are done checking a given file. Queries
with a cross-file dependency on the AST will reparse the file on demand.
This reduces ty's peak memory usage by ~20-30%.
The primary change of this PR is adding a `node_index` field to every
AST node, that is assigned by the parser. `ParsedModule` can use this to
create a flat index of AST nodes any time the file is parsed (or
reparsed). This allows `AstNodeRef` to simply index into the current
instance of the `ParsedModule`, instead of storing a pointer directly.
The indices are somewhat hackily (using an atomic integer) assigned by
the `parsed_module` query instead of by the parser directly. Assigning
the indices in source-order in the (recursive) parser turns out to be
difficult, and collecting the nodes during semantic indexing is
impossible as `SemanticIndex` does not hold onto a specific
`ParsedModuleRef`, which the pointers in the flat AST are tied to. This
means that we have to do an extra AST traversal to assign and collect
the nodes into a flat index, but the small performance impact (~3% on
cold runs) seems worth it for the memory savings.
Part of https://github.com/astral-sh/ty/issues/214.
## Summary
https://github.com/astral-sh/ty/issues/214 will require a couple
invasive changes that I would like to get merged even before garbage
collection is fully implemented (to avoid rebasing):
- `ParsedModule` can no longer be dereferenced directly. Instead you
need to load a `ParsedModuleRef` to access the AST, which requires a
reference to the salsa database (as it may require re-parsing the AST if
it was collected).
- `AstNodeRef` can only be dereferenced with the `node` method, which
takes a reference to the `ParsedModuleRef`. This allows us to encode the
fact that ASTs do not live as long as the database and may be collected
as soon a given instance of a `ParsedModuleRef` is dropped. There are a
number of places where we currently merge the `'db` and `'ast`
lifetimes, so this requires giving some types/functions two separate
lifetime parameters.
## Summary
This PR partially solves https://github.com/astral-sh/ty/issues/164
(derived from #17643).
Currently, the definitions we manage are limited to those for simple
name (symbol) targets, but we expand this to track definitions for
attribute and subscript targets as well.
This was originally planned as part of the work in #17643, but the
changes are significant, so I made it a separate PR.
After merging this PR, I will reflect this changes in #17643.
There is still some incomplete work remaining, but the basic features
have been implemented, so I am publishing it as a draft PR.
Here is the TODO list (there may be more to come):
* [x] Complete rewrite and refactoring of documentation (removing
`Symbol` and replacing it with `Place`)
* [x] More thorough testing
* [x] Consolidation of duplicated code (maybe we can consolidate the
handling related to name, attribute, and subscript)
This PR replaces the current `Symbol` API with the `Place` API, which is
a concept that includes attributes and subscripts (the term is borrowed
from Rust).
## Test Plan
`mdtest/narrow/assignment.md` is added.
---------
Co-authored-by: David Peter <sharkdp@users.noreply.github.com>
Co-authored-by: Carl Meyer <carl@astral.sh>