This is somewhat inspired by a similar abstraction in
`ruff_linter`. The main idea is to create an importer once
for a module that you want to add imports to. And then call
`import` to generate an edit for each symbol you want to
add.
I haven't done any performance profiling here yet. I don't
know if it will be a bottleneck. In particular, I do expect
`Importer::import` (but not `Importer::new`) to get called
many times for a single completion request when auto-import
is enabled. Particularly in projects with a lot of unimported
symbols. Because I don't know the perf impact, I didn't do
any premature optimization here. But there are surely some
low hanging fruit if this does prove to be a problem.
New tests make up a big portion of the diff here. I tried to
think of a bunch of different cases, although I'm sure there
are more.
I think this is a better home for it. This way, `ty_ide`
more clearly owns how the "kind" of a completion is computed.
In particular, it is computed differently for things where
we know its type versus unimported symbols.
In the course of writing the "add an import" implementation,
I realized that we needed to know which symbols were in scope
and how they were defined. This was necessary to be able to
determine how to add a new import in a way that (minimally)
does not conflict with existing symbols.
I'm not sure that this is fully correct (especially for
symbol bindings) and it's unclear to me in which cases a
definition site will be missing. But this seems to work for
some of the basic cases that I tried.
The names of the submodules returned should be *complete*. This
is the contract of `Module::name`. However, we were previously
only returning the basename of the submodule.
This re-works the `all_symbols` based added previously to work across
all modules available, and not just what is directly in the workspace.
Note that we always pass an empty string as a query, which makes the
results always empty. We'll fix this in a subsequent commit.
This is to facilitate recursive traversal of all modules in an
environment. This way, we can keep asking for submodules.
This also simplifies how this is used in completions, and probably makes
it faster. Namely, since we return the `Module` itself, callers don't
need to invoke the full module resolver just to get the module type.
Note that this doesn't include namespace packages. (Which were
previously not supported in `Module::all_submodules`.) Given how they
can be spread out across multiple search paths, they will likely require
special consideration here.
This makes `import <CURSOR>` and `from <CURSOR>` completions work.
This also makes `import os.<CURSOR>` and `from os.<CURSOR>`
completions work. In this case, we are careful to only offer
submodule completions.
`Type::TypeVar` now distinguishes whether the typevar in question is
inferable or not.
A typevar is _not inferable_ inside the body of the generic class or
function that binds it:
```py
def f[T](t: T) -> T:
return t
```
The infered type of `t` in the function body is `TypeVar(T,
NotInferable)`. This represents how e.g. assignability checks need to be
valid for all possible specializations of the typevar. Most of the
existing assignability/etc logic only applies to non-inferable typevars.
Outside of the function body, the typevar is _inferable_:
```py
f(4)
```
Here, the parameter type of `f` is `TypeVar(T, Inferable)`. This
represents how e.g. assignability doesn't need to hold for _all_
specializations; instead, we need to find the constraints under which
this specific assignability check holds.
This is in support of starting to perform specialization inference _as
part of_ performing the assignability check at the call site.
In the [[POPL2015][]] paper, this concept is called _monomorphic_ /
_polymorphic_, but I thought _non-inferable_ / _inferable_ would be
clearer for us.
Depends on #19784
[POPL2015]: https://doi.org/10.1145/2676726.2676991
---------
Co-authored-by: Carl Meyer <carl@astral.sh>
## Summary
Support recursive type aliases by adding a `Type::TypeAlias` type
variant, which allows referring to a type alias directly as a type
without eagerly unpacking it to its value.
We still unpack type aliases when they are added to intersections and
unions, so that we can simplify the intersection/union appropriately
based on the unpacked value of the type alias.
This introduces new possible recursive types, and so also requires
expanding our usage of recursion-detecting visitors in Type methods. The
use of these visitors is still not fully comprehensive in this PR, and
will require further expansion to support recursion in more kinds of
types (I already have further work on this locally), but I think it may
be better to do this incrementally in multiple PRs.
## Test Plan
Added some recursive type-alias tests and made them pass.
## Summary
This PR adds a new `Type::TypedDict` variant. Before this PR, we treated
`TypedDict`-based types as dynamic Todo-types, and I originally planned
to make this change a no-op. And we do in fact still treat that new
variant similar to a dynamic type when it comes to type properties such
as assignability and subtyping. But then I somehow tricked myself into
implementing some of the things correctly, so here we are. The two main
behavioral changes are: (1) we now also detect generic `TypedDict`s,
which removes a few false positives in the ecosystem, and (2) we now
support *attribute* access (not key-based indexing!) on these types,
i.e. we infer proper types for something like
`MyTypedDict.__required_keys__`. Nothing exciting yet, but gets the
infrastructure into place.
Note that with this PR, the type of (the type) `MyTypedDict` itself is
still represented as a `Type::ClassLiteral` or `Type::GenericAlias` (in
case `MyTypedDict` is generic). Only inhabitants of `MyTypedDict`
(instances of `dict` at runtime) are represented by `Type::TypedDict`.
We may want to revisit this decision in the future, if this turns out to
be too error-prone. Right now, we need to use `.is_typed_dict(db)` in
all the right places to distinguish between actual (generic) classes and
`TypedDict`s. But so far, it seemed unnecessary to add additional `Type`
variants for these as well.
part of https://github.com/astral-sh/ty/issues/154
## Ecosystem impact
The new diagnostics on `cloud-init` look like true positives to me.
## Test Plan
Updated and new Markdown tests
This change makes it so we aren't doing a directory traversal every time
we ask for completions from a module. Specifically, submodules that
aren't attributes of their parent module can only be discovered by
looking at the directory tree. But we want to avoid doing a directory
scan unless we think there are changes.
To make this work, this change does a little bit of surgery to
`FileRoot`. Previously, a `FileRoot` was only used for library search
paths. Its revision was bumped whenever a file in that tree was added,
deleted or even modified (to support the discovery of `pth` files and
changes to its contents). This generally seems fine since these are
presumably dependency paths that shouldn't change frequently.
In this change, we add a `FileRoot` for the project. But having the
`FileRoot`'s revision bumped for every change in the project makes
caching based on that `FileRoot` rather ineffective. That is, cache
invalidation will occur too aggressively. To the point that there is
little point in adding caching in the first place. To mitigate this, a
`FileRoot`'s revision is only bumped on a change to a child file's
contents when the `FileRoot` is a `LibrarySearchPath`. Otherwise, we
only bump the revision when a file is created or added.
The effect is that, at least in VS Code, when a new module is added or
removed, this change is picked up and the cache is properly invalidated.
Other LSP clients with worse support for file watching (which seems to
be the case for the CoC vim plugin that I use) don't work as well. Here,
the cache is less likely to be invalidated which might cause completions
to have stale results. Unless there's an obvious way to fix or improve
this, I propose punting on improvements here for now.
## Summary
Add a new `Type::EnumLiteral(…)` variant and infer this type for member
accesses on enums.
**Example**: No more `@Todo` types here:
```py
from enum import Enum
class Answer(Enum):
YES = 1
NO = 2
def is_yes(self) -> bool:
return self == Answer.YES
reveal_type(Answer.YES) # revealed: Literal[Answer.YES]
reveal_type(Answer.YES == Answer.NO) # revealed: Literal[False]
reveal_type(Answer.YES.is_yes()) # revealed: bool
```
## Test Plan
* Many new Markdown tests for the new type variant
* Added enum literal types to property tests, ran property tests
## Ecosystem analysis
Summary:
Lots of false positives removed. All of the new diagnostics are
either new true positives (the majority) or known problems. Click for
detailed analysis</summary>
Details:
```diff
AutoSplit (https://github.com/Toufool/AutoSplit)
+ error[call-non-callable] src/capture_method/__init__.py:137:9: Method `__getitem__` of type `bound method CaptureMethodDict.__getitem__(key: Never, /) -> type[CaptureMethodBase]` is not callable on object of type `CaptureMethodDict`
+ error[call-non-callable] src/capture_method/__init__.py:147:9: Method `__getitem__` of type `bound method CaptureMethodDict.__getitem__(key: Never, /) -> type[CaptureMethodBase]` is not callable on object of type `CaptureMethodDict`
+ error[call-non-callable] src/capture_method/__init__.py:148:1: Method `__getitem__` of type `bound method CaptureMethodDict.__getitem__(key: Never, /) -> type[CaptureMethodBase]` is not callable on object of type `CaptureMethodDict`
```
New true positives. That `__getitem__` method is apparently annotated
with `Never` to prevent developers from using it.
```diff
dd-trace-py (https://github.com/DataDog/dd-trace-py)
+ error[invalid-assignment] ddtrace/vendor/psutil/_common.py:29:5: Object of type `None` is not assignable to `Literal[AddressFamily.AF_INET6]`
+ error[invalid-assignment] ddtrace/vendor/psutil/_common.py:33:5: Object of type `None` is not assignable to `Literal[AddressFamily.AF_UNIX]`
```
Arguably true positives:
e0a772c28b/ddtrace/vendor/psutil/_common.py (L29)
```diff
ignite (https://github.com/pytorch/ignite)
+ error[invalid-argument-type] tests/ignite/engine/test_custom_events.py:190:34: Argument to bound method `__call__` is incorrect: Expected `((...) -> Unknown) | None`, found `Literal["123"]`
+ error[invalid-argument-type] tests/ignite/engine/test_custom_events.py:220:37: Argument to function `default_event_filter` is incorrect: Expected `Engine`, found `None`
+ error[invalid-argument-type] tests/ignite/engine/test_custom_events.py:220:43: Argument to function `default_event_filter` is incorrect: Expected `int`, found `None`
+ error[call-non-callable] tests/ignite/engine/test_custom_events.py:561:9: Object of type `CustomEvents` is not callable
+ error[invalid-argument-type] tests/ignite/metrics/test_frequency.py:50:38: Argument to bound method `attach` is incorrect: Expected `Events`, found `CallableEventWithFilter`
```
All true positives. Some of them are inside `pytest.raises(TypeError,
…)` blocks 🙃
```diff
meson (https://github.com/mesonbuild/meson)
+ error[invalid-argument-type] unittests/internaltests.py:243:51: Argument to bound method `__init__` is incorrect: Expected `bool`, found `Literal[MachineChoice.HOST]`
+ error[invalid-argument-type] unittests/internaltests.py:271:51: Argument to bound method `__init__` is incorrect: Expected `bool`, found `Literal[MachineChoice.HOST]`
```
New true positives. Enum literals can not be assigned to `bool`, even if
their value types are `0` and `1`.
```diff
poetry (https://github.com/python-poetry/poetry)
+ error[invalid-assignment] src/poetry/console/exceptions.py:101:5: Object of type `Literal[""]` is not assignable to `InitVar[str]`
```
New false positive, missing support for `InitVar`.
```diff
prefect (https://github.com/PrefectHQ/prefect)
+ error[invalid-argument-type] src/integrations/prefect-dask/tests/test_task_runners.py:193:17: Argument is incorrect: Expected `StateType`, found `Literal[StateType.COMPLETED]`
```
This is confusing. There are two definitions
([one](74d8cd93ee/src/prefect/client/schemas/objects.py (L89-L100)),
[two](https://github.com/PrefectHQ/prefect/blob/main/src/prefect/server/schemas/states.py#L40))
of the `StateType` enum. Here, we're trying to assign one to the other.
I don't think that should be allowed, so this is a true positive (?).
```diff
python-htmlgen (https://github.com/srittau/python-htmlgen)
+ error[invalid-assignment] test_htmlgen/form.py:51:9: Object of type `str` is not assignable to attribute `autocomplete` of type `Autocomplete | None`
+ error[invalid-assignment] test_htmlgen/video.py:38:9: Object of type `str` is not assignable to attribute `preload` of type `Preload | None`
```
True positives. [The stubs are
wrong](01e3b911ac/htmlgen/form.pyi (L8-L10)).
These should not contain type annotations, but rather just `OFF = ...`.
```diff
rotki (https://github.com/rotki/rotki)
+ error[invalid-argument-type] rotkehlchen/tests/unit/test_serialization.py:62:30: Argument to bound method `deserialize` is incorrect: Expected `str`, found `Literal[15]`
```
New true positive.
```diff
vision (https://github.com/pytorch/vision)
+ error[unresolved-attribute] test/test_extended_models.py:302:17: Type `type[WeightsEnum]` has no attribute `DEFAULT`
+ error[unresolved-attribute] test/test_extended_models.py:302:58: Type `type[WeightsEnum]` has no attribute `DEFAULT`
```
Also new true positives. No `DEFAULT` member exists on `WeightsEnum`.
While we did previously support submodule completions via our
`all_members` API, that only works when submodules are attributes of
their parent module. For example, `os.path`. But that didn't work when
the submodule was not an attribute of its parent. For example,
`http.client`. To make the latter work, we read the directory of the
parent module to discover its submodules.
This makes use of the new `Type` field on `Completion` to figure out the
"kind" of a `Completion`.
The mapping here is perhaps a little suspect for some cases.
Closesastral-sh/ty#775
Since we generally need (so far) to get the type information of each
suggestion to figure out its boundness anyway, we might as well expose
it here. Completions want to use this information to enhance the
metadata on each suggestion for a more pleasant user experience.
For the most part, this was pretty straight-forward. The most exciting
part was in computing the types for instance attributes. I'm not 100%
sure it's correct or is the best way to do it.
This commit doesn't change any behavior, but makes it so `all_members`
returns a `Vec<Member>` instead of `Vec<Name>`, where a `Member`
contains a `Name`. This gives us an expansion point to include other
data (such as the type of the `Name`).
This implements filtering of private symbols from stub files based on
type information as discussed in
https://github.com/astral-sh/ruff/pull/19102. It extends the previous
implementation to apply to all stub files, instead of just the
`builtins` module, and uses type information to retain private names
that are may be relevant at runtime.
## Summary
The motivation of `ScopedExpressionId` was that we have an expression
identifier that's local to a scope and, therefore, unlikely to change if
a user makes changes in another scope. A local identifier like this has
the advantage that query results may remain unchanged even if other
parts of the file change, which in turn allows Salsa to short-circuit
dependent queries.
However, I noticed that we aren't using `ScopedExpressionId` in a place
where it's important that the identifier is local. It's main use is
inside `infer` which we always run for the entire file. The one
exception to this is `Unpack` but unpack runs as part of `infer`.
Edit: The above isn't entirely correct. We used ScopedExpressionId in
TypeInference which is a query result. Now using ExpressionNodeKey does
mean that a change to the AST invalidates most if not all TypeInference
results of a single file. Salsa then has to run all dependent queries to
see if they're affected by this change even if the change was local to
another scope.
If this locality proves to be important I suggest that we create two
queries on top of TypeInference: one that returns the expression map
which is mainly used in the linter and type inference and a second that
returns all remaining fields. This should give us a similar optimization
at a much lower cost
I also considered remove `ScopedUseId` but I believe that one is still
useful because using `ExpressionNodeKey` for it instead would mean that
all `UseDefMap` change when a single AST node changes. Whether this is
important is something difficult to assess. I'm simply not familiar
enough with the `UseDefMap`. If the locality doesn't matter for the
`UseDefMap`, then a similar change could be made and `bindings_by_use`
could be changed to an `FxHashMap<UseId, Bindings>` where `UseId` is a
thin wrapper around `NodeKey`.
Closes https://github.com/astral-sh/ty/issues/721
Most of the work here was doing some light refactoring to facilitate
sensible testing. That is, we don't want to list every builtin included
in most tests, so we add some structure to the completion type returned.
Tests can now filter based on whether a completion is a builtin or not.
Otherwise, builtins are found using the existing infrastructure for
`object.attr` completions (where we hard-code the module name
`builtins`).
I did consider changing the sort order based on whether a completion
suggestion was a builtin or not. In particular, it seemed like it might
be a good idea to sort builtins after other scope based completions,
but before the dunder and sunder attributes. Namely, it seems likely
that there is an inverse correlation between the size of a scope and
the likelihood of an item in that scope being used at any given point.
So it *might* be a good idea to prioritize the likelier candidates in
the completions returned.
Additionally, the number of items introduced by adding builtins is quite
large. So I wondered whether mixing them in with everything else would
become too noisy.
However, it's not totally clear to me that this is the right thing to
do. Right now, I feel like there is a very obvious lexicographic
ordering that makes "finding" the right suggestion to activate
potentially easier than if the ranking mechanism is less clear.
(Technically, the dunder and sunder attributes are not sorted
lexicographically, but I'd put forward that most folks don't have an
intuitive understanding of where `_` ranks lexicographically with
respect to "regular" letters. Moreover, since dunder and sunder
attributes are all grouped together, I think the ordering here ends up
being very obvious after even a quick glance.)
There were two main challenges in this PR.
The first was mostly just figuring out how to get the symbols
corresponding to `module`. It turns out that we do this in a couple
of places in ty already, but through different means. In one approach,
we use [`exported_names`]. In another approach, we get a `Type`
corresponding to the module. We take the latter approach here, which is
consistent with how we do completions elsewhere. (I looked into
factoring this logic out into its own function, but it ended up being
pretty constrained. e.g., There's only one other place where we want to
go from `ast::StmtImportFrom` to a module `Type`, and that code also
wants the module name.)
The second challenge was recognizing the `from module import <CURSOR>`
pattern in the code. I initially started with some fixed token patterns
to get a proof of concept working. But I ended up switching to mini
state machine over tokens. I looked at the parser for `StmtImportFrom`
to determine what kinds of tokens we can expect.
[`exported_names`]:
23a3b6ef23/crates/ty_python_semantic/src/semantic_index/re_exports.rs (L47)
## Summary
https://github.com/astral-sh/ty/issues/214 will require a couple
invasive changes that I would like to get merged even before garbage
collection is fully implemented (to avoid rebasing):
- `ParsedModule` can no longer be dereferenced directly. Instead you
need to load a `ParsedModuleRef` to access the AST, which requires a
reference to the salsa database (as it may require re-parsing the AST if
it was collected).
- `AstNodeRef` can only be dereferenced with the `node` method, which
takes a reference to the `ParsedModuleRef`. This allows us to encode the
fact that ASTs do not live as long as the database and may be collected
as soon a given instance of a `ParsedModuleRef` is dropped. There are a
number of places where we currently merge the `'db` and `'ast`
lifetimes, so this requires giving some types/functions two separate
lifetime parameters.
## Summary
This PR partially solves https://github.com/astral-sh/ty/issues/164
(derived from #17643).
Currently, the definitions we manage are limited to those for simple
name (symbol) targets, but we expand this to track definitions for
attribute and subscript targets as well.
This was originally planned as part of the work in #17643, but the
changes are significant, so I made it a separate PR.
After merging this PR, I will reflect this changes in #17643.
There is still some incomplete work remaining, but the basic features
have been implemented, so I am publishing it as a draft PR.
Here is the TODO list (there may be more to come):
* [x] Complete rewrite and refactoring of documentation (removing
`Symbol` and replacing it with `Place`)
* [x] More thorough testing
* [x] Consolidation of duplicated code (maybe we can consolidate the
handling related to name, attribute, and subscript)
This PR replaces the current `Symbol` API with the `Place` API, which is
a concept that includes attributes and subscripts (the term is borrowed
from Rust).
## Test Plan
`mdtest/narrow/assignment.md` is added.
---------
Co-authored-by: David Peter <sharkdp@users.noreply.github.com>
Co-authored-by: Carl Meyer <carl@astral.sh>
## Summary
Previously, all symbols where provided as possible completions. In an
example like the following, both `foo` and `f` were suggested as
completions, because `f` itself is a symbol.
```py
foo = 1
f<CURSOR>
```
Similarly, in the following example, `hidden_symbol` was suggested, even
though it is not statically visible:
```py
if 1 + 2 != 3:
hidden_symbol = 1
hidden_<CURSOR>
```
With the change suggested here, we only use statically visible
declarations and bindings as a source for completions.
## Test Plan
- Updated snapshot tests
- New test for statically hidden definitions
- Added test for star import
## Summary
Implement a hotfix for the playground/LSP crashes related to missing
`expression_scope_id`s.
relates to: https://github.com/astral-sh/ty/issues/572
## Test Plan
* Regression tests from https://github.com/astral-sh/ruff/pull/18441
* Ran the playground locally to check if panics occur / completions
still work.
---------
Co-authored-by: Andrew Gallant <andrew@astral.sh>
This PR implements template strings (t-strings) in the parser and
formatter for Ruff.
Minimal changes necessary to compile were made in other parts of the code (e.g. ty, the linter, etc.). These will be covered properly in follow-up PRs.
Previously, completions were based on just returning every identifier
parsed in the current Python file. In this commit, we change it to
identify an expression under the cursor and then return all symbols
available to the scope containing that expression.
This is still returning too much, and also, in some cases, not enough.
Namely, it doesn't really take the specific context into account other
than scope. But this does improve on the status quo. For example:
def foo(): ...
def bar():
def fast(): ...
def foofoo(): ...
f<CURSOR>
When asking for completions here, the LSP will no longer include `fast`
as a possible completion in this context.
Ref https://github.com/astral-sh/ty/issues/86