## Summary
Add the ability to detect instance attribute assignments in class
methods that are generic.
This does not address the code duplication mentioned in #16928. I can
open a ticket for this after this has been merged.
closes#16928
## Test Plan
Added regression test.
Summary
--
This PR extends semantic syntax error detection to red-knot. The main
changes here are:
1. Adding `SemanticSyntaxChecker` and `Vec<SemanticSyntaxError>` fields
to the `SemanticIndexBuilder`
2. Calling `SemanticSyntaxChecker::visit_stmt` and `visit_expr` in the
`SemanticIndexBuilder`'s `visit_stmt` and `visit_expr` methods
3. Implementing `SemanticSyntaxContext` for `SemanticIndexBuilder`
4. Adding new mdtests to test the context implementation and show
diagnostics
(3) is definitely the trickiest and required (I think) a minor addition
to the `SemanticIndexBuilder`. I tried to look around for existing code
performing the necessary checks, but I definitely could have missed
something or misused the existing code even when I found it.
There's still one TODO around `global` statement handling. I don't think
there's an existing way to look this up, but I'm happy to work on that
here or in a separate PR. This currently only affects detection of one
error (`LoadBeforeGlobalDeclaration` or
[PLE0118](https://docs.astral.sh/ruff/rules/load-before-global-declaration/)
in ruff), so it's not too big of a problem even if we leave the TODO.
Test Plan
--
New mdtests, as well as new errors for existing mdtests
---------
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
## Summary
This PR is a follow-up to #16852.
Instance variables bound in comprehensions are recorded, allowing type
inference to work correctly.
This required adding support for unpacking in comprehension which
resolves https://github.com/astral-sh/ruff/issues/15369.
## Test Plan
One TODO in `mdtest/attributes.md` is now resolved, and some new test
cases are added.
---------
Co-authored-by: Dhruv Manilawala <dhruvmanila@gmail.com>
## Summary
This PR extends version-related syntax error detection to red-knot. The
main changes here are:
1. Passing `ParseOptions` specifying a `PythonVersion` to parser calls
2. Adding a `python_version` method to the `Db` trait to make this
possible
3. Converting `UnsupportedSyntaxError`s to `Diagnostic`s
4. Updating existing mdtests to avoid unrelated syntax errors
My initial draft of (1) and (2) in #16090 instead tried passing a
`PythonVersion` down to every parser call, but @MichaReiser suggested
the `Db` approach instead
[here](https://github.com/astral-sh/ruff/pull/16090#discussion_r1969198407),
and I think it turned out much nicer.
All of the new `python_version` methods look like this:
```rust
fn python_version(&self) -> ruff_python_ast::PythonVersion {
Program::get(self).python_version(self)
}
```
with the exception of the `TestDb` in `ruff_db`, which hard-codes
`PythonVersion::latest()`.
## Test Plan
Existing mdtests, plus a new mdtest to see at least one of the new
diagnostics.
## Summary
Similar to what we did for `unresolved-reference` and
`unresolved-attribute`, we now also silence `unresolved-import`
diagnostics if the corresponding `import` statement is unreachable.
This addresses the (already closed) issue #17049.
## Test Plan
Adapted Markdown tests.
## Summary
Basically just repeat the same thing that we did for
`unresolved-reference`, but now for attribute expressions.
We now also handle the case where the unresolved attribute (or the
unresolved reference) diagnostic originates from a stringified type
annotation.
And I made the evaluation of reachability constraints lazy (will only be
evaluated right before we are about to emit a diagnostic).
## Test Plan
New Markdown tests for stringified annotations.
## Summary
Track the reachability of nested scopes within their parent scopes. We
use this as an additional requirement for emitting
`unresolved-reference` diagnostics (and in the future,
`unresolved-attribute` and `unresolved-import`). This means that we only
emit `unresolved-reference` for a given use of a symbol if the use
itself is reachable (within its own scope), *and if the scope itself is
reachable*. For example, no diagnostic should be emitted for the use of
`x` here:
```py
if False:
x = 1
def f():
print(x) # this use of `x` is reachable inside the `f` scope,
# but the whole `f` scope is not reachable.
```
There are probably more fine-grained ways of solving this problem, but
they require a more sophisticated understanding of nested scopes (see
#15777, in particular
https://github.com/astral-sh/ruff/issues/15777#issuecomment-2788950267).
But it doesn't seem completely unreasonable to silence *this specific
kind of error* in unreachable scopes.
## Test Plan
Observed changes in reachability tests and ecosystem.
## Summary
From #16861, and the continuation of #16915.
This PR fixes the incorrect behavior of
`TypeInferenceBuilder::infer_name_load` in eager nested scopes.
And this PR closes#16341.
## Test Plan
New test cases are added in `annotations/deferred.md`.
## Summary
This PR adds initial support for `*` imports to red-knot. The approach
is to implement a standalone query, called from semantic indexing, that
visits the module referenced by the `*` import and collects all
global-scope public names that will be imported by the `*` import. The
`SemanticIndexBuilder` then adds separate definitions for each of these
names, all keyed to the same `ast::Alias` node that represents the `*`
import.
There are many pieces of `*`-import semantics that are still yet to be
done, even with this PR:
- This PR does not attempt to implement any of the semantics to do with
`__all__`. (If a module defines `__all__`, then only the symbols
included in `__all__` are imported, _not_ all public global-scope
symbols.
- With the logic implemented in this PR as it currently stands, we
sometimes incorrectly consider a symbol bound even though it is defined
in a branch that is statically known to be dead code, e.g. (assuming the
target Python version is set to 3.11):
```py
# a.py
import sys
if sys.version_info < (3, 10):
class Foo: ...
```
```py
# b.py
from a import *
print(Foo) # this is unbound at runtime on 3.11,
# but we currently consider it bound with the logic in this PR
```
Implementing these features is important, but is for now deferred to
followup PRs.
Many thanks to @ntBre, who contributed to this PR in a pairing session
on Friday!
## Test Plan
Assertions in existing mdtests are adjusted, and several new ones are
added.
## Summary
Another salsa upgrade.
The main motivation is to stay on a recent salsa version because there
are still a lot of breaking changes happening.
The most significant changes in this update:
* Salsa no longer derives `Debug` by default. It now requires
`interned(debug)` (or similar)
* This version ships the foundation for garbage collecting interned
values. However, this comes at the cost that queries now track which
interned values they created (or read). The micro benchmarks in the
salsa repo showed a significant perf regression. Will see if this also
visible in our benchmarks.
## Test Plan
`cargo test`
## Summary
This PR introduces a new mdtest option `system` that can either be
`in-memory` or `os`
where `in-memory` is the default.
The motivation for supporting `os` is so that we can write OS/system
specific tests
with mdtests. Specifically, I want to write mdtests for the module
resolver,
testing that module resolution is case sensitive.
## Test Plan
I tested that the case-sensitive module resolver test start failing when
setting `system = "os"`
In https://github.com/astral-sh/ruff/pull/16306#discussion_r1966290700,
@carljm pointed out that #16306 introduced a terminology problem, with
too many things called a "constraint". This is a follow-up PR that
renames `Constraint` to `Predicate` to hopefully clear things up a bit.
So now we have that:
- a _predicate_ is a Python expression that might influence type
inference
- a _narrowing constraint_ is a list of predicates that constraint the
type of a binding that is visible at a use
- a _visibility constraint_ is a ternary formula of predicates that
define whether a binding is visible or a statement is reachable
This is a pure renaming, with no behavioral changes.
This PR adds an implementation of [association
lists](https://en.wikipedia.org/wiki/Association_list), and uses them to
replace the previous `BitSet`/`SmallVec` representation for narrowing
constraints.
An association list is a linked list of key/value pairs. We additionally
guarantee that the elements of an association list are sorted (by their
keys), and that they do not contain any entries with duplicate keys.
Association lists have fallen out of favor in recent decades, since you
often need operations that are inefficient on them. In particular,
looking up a random element by index is O(n), just like a linked list;
and looking up an element by key is also O(n), since you must do a
linear scan of the list to find the matching element. Luckily we don't
need either of those operations for narrowing constraints!
The typical implementation also suffers from poor cache locality and
high memory allocation overhead, since individual list cells are
typically allocated separately from the heap. We solve that last problem
by storing the cells of an association list in an `IndexVec` arena.
---------
Co-authored-by: Carl Meyer <carl@astral.sh>
We now resolve references in "eager" scopes correctly — using the
bindings and declarations that are visible at the point where the eager
scope is created, not the "public" type of the symbol (typically the
bindings visible at the end of the scope).
---------
Co-authored-by: Alex Waygood <alex.waygood@gmail.com>
## Summary
Transition to using coarse-grained tracked structs (depends on
https://github.com/salsa-rs/salsa/pull/657). For now, this PR doesn't
add any `#[tracked]` fields, meaning that any changes cause the entire
struct to be invalidated. It also changes `AstNodeRef` to be
compared/hashed by pointer address, instead of performing a deep AST
comparison.
## Test Plan
This yields a 10-15% improvement on my machine (though weirdly some runs
were 5-10% without being flagged as inconsistent by criterion, is there
some non-determinism involved?). It's possible that some of this is
unrelated, I'll try applying the patch to the current salsa version to
make sure.
---------
Co-authored-by: Micha Reiser <micha@reiser.io>
This extracts some pure refactoring noise from
https://github.com/astral-sh/ruff/pull/15861. This changes the API for
creating and evaluating visibility constraints, but does not change how
they are respresented internally. There should be no behavioral or
performance changes in this PR.
Changes:
- Hide the internal representation isn't changed, so that we can make
changes to it in #15861.
- Add a separate builder type for visibility constraints. (With TDDs, we
will have some additional builder state that we can throw away once
we're done constructing.)
- Remove a layer of helper methods from `UseDefMapBuilder`, making
`SemanticIndexBuilder` responsible for constructing whatever visibility
constraints it needs.
## Summary
Add support for implicitly-defined instance attributes, i.e. support
type inference for cases like this:
```py
class C:
def __init__(self) -> None:
self.x: int = 1
self.y = None
reveal_type(C().x) # int
reveal_type(C().y) # Unknown | None
```
## Benchmarks
Codspeed reports no change in a cold-cache benchmark, and a -1%
regression in the incremental benchmark. On `black`'s `src` folder, I
don't see a statistically significant difference between the branches:
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| `./red_knot_main check --project /home/shark/black/src` | 133.7 ± 9.5 | 126.7 | 164.7 | 1.01 ± 0.08 |
| `./red_knot_feature check --project /home/shark/black/src` | 132.2 ± 5.1 | 118.1 | 140.9 | 1.00 |
## Test Plan
Updated and new Markdown tests
## Summary
This changeset adds support for precise type-inference and
boundness-handling of definitions inside control-flow branches with
statically-known conditions, i.e. test-expressions whose truthiness we
can unambiguously infer as *always false* or *always true*.
This branch also includes:
- `sys.platform` support
- statically-known branches handling for Boolean expressions and while
loops
- new `target-version` requirements in some Markdown tests which were
now required due to the understanding of `sys.version_info` branches.
closes#12700closes#15034
## Performance
### `tomllib`, -7%, needs to resolve one additional module (sys)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| `./red_knot_main --project /home/shark/tomllib` | 22.2 ± 1.3 | 19.1 |
25.6 | 1.00 |
| `./red_knot_feature --project /home/shark/tomllib` | 23.8 ± 1.6 | 20.8
| 28.6 | 1.07 ± 0.09 |
### `black`, -6%
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| `./red_knot_main --project /home/shark/black` | 129.3 ± 5.1 | 119.0 |
137.8 | 1.00 |
| `./red_knot_feature --project /home/shark/black` | 136.5 ± 6.8 | 123.8
| 147.5 | 1.06 ± 0.07 |
## Test Plan
- New Markdown tests for the main feature in
`statically-known-branches.md`
- New Markdown tests for `sys.platform`
- Adapted tests for `EllipsisType`, `Never`, etc
When importing a nested module, we were correctly creating a binding for
the top-most parent, but we were binding that to the nested module, not
to that parent module. Moreover, we weren't treating those submodules as
members of their containing parents. This PR addresses both issues, so
that nested imports work as expected.
As discussed in ~Slack~ whatever chat app I find myself in these days
😄, this requires keeping track of which modules have been imported
within the current file, so that when we resolve member access on a
module reference, we can see if that member has been imported as a
submodule. If so, we return the submodule reference immediately, instead
of checking whether the parent module's definition defines the symbol.
This is currently done in a flow insensitive manner. The `SemanticIndex`
now tracks all of the modules that are imported (via `import`, not via
`from...import`). The member access logic mentioned above currently only
considers module imports in the file containing the attribute
expression.
---------
Co-authored-by: Carl Meyer <carl@astral.sh>
## Summary
Inferred and declared types for function parameters, in the function
body scope.
Fixes#13693.
## Test Plan
Added mdtests.
---------
Co-authored-by: Micha Reiser <micha@reiser.io>
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
## Summary
This PR adds support for unpacking tuple expression in an assignment
statement where the target expression can be a tuple or a list (the
allowed sequence targets).
The implementation introduces a new `infer_assignment_target` which can
then be used for other targets like the ones in for loops as well. This
delegates it to the `infer_definition`. The final implementation uses a
recursive function that visits the target expression in source order and
compares the variable node that corresponds to the definition. At the
same time, it keeps track of where it is on the assignment value type.
The logic also accounts for the number of elements on both sides such
that it matches even if there's a gap in between. For example, if
there's a starred expression like `(a, *b, c) = (1, 2, 3)`, then the
type of `a` will be `Literal[1]` and the type of `b` will be
`Literal[2]`.
There are a couple of follow-ups that can be done:
* Use this logic for other target positions like `for` loop
* Add diagnostics for mis-match length between LHS and RHS
## Test Plan
Add various test cases using the new markdown test framework.
Validate that existing test cases pass.
---------
Co-authored-by: Carl Meyer <carl@astral.sh>
Use declared types in inference and checking. This means several things:
* Imports prefer declarations over inference, when declarations are
available.
* When we encounter a binding, we check that the bound value's inferred
type is assignable to the live declarations of the bound symbol, if any.
* When we encounter a declaration, we check that the declared type is
assignable from the inferred type of the symbol from previous bindings,
if any.
* When we encounter a binding+declaration, we check that the inferred
type of the bound value is assignable to the declared type.
Add support for declared types to the semantic index. This involves a
lot of renaming to clarify the distinction between bindings and
declarations. The Definition (or more specifically, the DefinitionKind)
becomes responsible for determining which definitions are bindings,
which are declarations, and which are both, and the symbol table
building is refactored a bit so that the `IS_BOUND` (renamed from
`IS_DEFINED` for consistent terminology) flag is always set when a
binding is added, rather than being set separately (and requiring us to
ensure it is set properly).
The `SymbolState` is split into two parts, `SymbolBindings` and
`SymbolDeclarations`, because we need to store live bindings for every
declaration and live declarations for every binding; the split lets us
do this without storing more than we need.
The massive doc comment in `use_def.rs` is updated to reflect bindings
vs declarations.
The `UseDefMap` gains some new APIs which are allow-unused for now,
since this PR doesn't yet update type inference to take declarations
into account.
## Summary
This PR adds support for control flow for match statement.
It also adds the necessary infrastructure required for narrowing
constraints in case blocks and implements the logic for
`PatternMatchSingleton` which is either `None` / `True` / `False`. Even
after this the inferred type doesn't get simplified completely, there's
a TODO for that in the test code.
## Test Plan
Add test cases for control flow for (a) when there's a wildcard pattern
and (b) when there isn't. There's also a test case to verify the
narrowing logic.
---------
Co-authored-by: Carl Meyer <carl@astral.sh>
## Summary
Part of #13085, this PR updates the comprehension definition to handle
multiple targets.
## Test Plan
Update existing semantic index test case for comprehension with multiple
targets. Running corpus tests shouldn't panic.
## Summary
This PR adds definition for match patterns.
## Test Plan
Update the existing test case for match statement symbols to verify that
the definitions are added as well.
This PR has the `SemanticIndexBuilder` visit function definition
annotations before adding the function symbol/name to the builder.
For example, the following snippet no longer causes a panic:
```python
def bool(x) -> bool:
Return True
```
Note: This fix changes the ordering of the global symbol table.
Closes#13069
## Summary
This PR adds symbols introduced by `for` loops to red-knot:
- `x` in `for x in range(10): pass`
- `x` and `y` in `for x, y in d.items(): pass`
- `a`, `b`, `c` and `d` in `for [((a,), b), (c, d)] in foo: pass`
## Test Plan
Several tests added, and the assertion in the benchmarks has been
updated.
---------
Co-authored-by: Micha Reiser <micha@reiser.io>
## Summary
This PR adds symbols and definitions introduced by `with` statements.
The symbols and definitions are introduced for each with item. The type
inference is updated to call the definition region type inference
instead.
## Test Plan
Add test case to check for symbol table and definitions.
## Summary
This PR adds symbols introduced by `match` statements.
There are three patterns that introduces new symbols:
* `as` pattern
* Sequence pattern
* Mapping pattern
The recursive nature of the visitor makes sure that all symbols are
added.
## Test Plan
Add test case for all types of patterns that introduces a symbol.
## Summary
This PR adds definition for augmented assignment. This is similar to
annotated assignment in terms of implementation.
An augmented assignment should also record a use of the variable but
that's a TODO for now.
## Test Plan
Add test case to validate that a definition is added.
Extend the `UseDefMap` to also track which constraints (provided by e.g.
`if` tests) apply to each visible definition.
Uses a custom `BitSet` and `BitSetArray` to track which constraints
apply to which definitions, while keeping data inline as much as
possible.
## Summary
This PR adds support for adding symbols and definitions for function and
lambda parameters to the semantic index.
### Notes
* The default expression of a parameter is evaluated in the enclosing
scope (not the type parameter or function scope).
* The annotation expression of a parameter is evaluated in the type
parameter scope if they're present other in the enclosing scope.
* The symbols and definitions are added in the function parameter scope.
### Type Inference
There are two definitions `Parameter` and `ParameterWithDefault` and
their respective `*_definition` methods on the type inference builder.
These methods are preferred and are re-used when checking from a
different region.
## Test Plan
Add test case for validating that the parameters are defined in the
function / lambda scope.
### Benchmark update
Validated the difference in diagnostics for benchmark code between
`main` and this branch. All of them are either directly or indirectly
referencing one of the function parameters. The diff is in the PR description.
## Summary
This PR adds scope and definition for comprehension nodes. This includes
the following nodes:
* List comprehension
* Dictionary comprehension
* Set comprehension
* Generator expression
### Scope
Each expression here adds it's own scope with one caveat - the `iter`
expression of the first generator is part of the parent scope. For
example, in the following code snippet the `iter1` variable is evaluated
in the outer scope.
```py
[x for x in iter1]
```
> The iterable expression in the leftmost for clause is evaluated
directly in the enclosing scope and then passed as an argument to the
implicitly nested scope.
>
> Reference:
https://docs.python.org/3/reference/expressions.html#displays-for-lists-sets-and-dictionaries
There's another special case for assignment expressions:
> There is one special case: an assignment expression occurring in a
list, set or dict comprehension or in a generator expression (below
collectively referred to as “comprehensions”) binds the target in the
containing scope, honoring a nonlocal or global declaration for the
target in that scope, if one exists.
>
> Reference: https://peps.python.org/pep-0572/#scope-of-the-target
For example, in the following code snippet, the variables `a` and `b`
are available after the comprehension while `x` isn't:
```py
[a := 1 for x in range(2) if (b := 2)]
```
### Definition
Each comprehension node adds a single definition, the "target" variable
(`[_ for target in iter]`). This has been accounted for and a new
variant has been added to `DefinitionKind`.
### Type Inference
Currently, type inference is limited to a single scope. It doesn't
_enter_ in another scope to infer the types of the remaining expressions
of a node. To accommodate this, the type inference for a **scope**
requires new methods which _doesn't_ infer the type of the `iter`
expression of the leftmost outer generator (that's defined in the
enclosing scope).
The type inference for the scope region is split into two parts:
* `infer_generator_expression` (similarly for comprehensions) infers the
type of the `iter` expression of the leftmost outer generator
* `infer_generator_expression_scope` (similarly for comprehension)
infers the type of the remaining expressions except for the one
mentioned in the previous point
The type inference for the **definition** also needs to account for this
special case of leftmost generator. This is done by defining a `first`
boolean parameter which indicates whether this comprehension definition
occurs first in the enclosing expression.
## Test Plan
New test cases were added to validate multiple scenarios. Refer to the
documentation for each test case which explains what is being tested.
Make `cargo doc -p red_knot_python_semantic --document-private-items`
run warning-free. I'd still like to do this for all of ruff and start
enforcing it in CI (https://github.com/astral-sh/ruff/issues/12372) but
haven't gotten to it yet. But in the meantime I'm trying to maintain it
for at least `red_knot_python_semantic`, as it helps to ensure our doc
comments stay up to date.
A few of the comments I just removed or shortened, as their continued
relevance wasn't clear to me; please object in review if you think some
of them are important to keep!
Also remove a no-longer-needed `allow` attribute.
Per comments in https://github.com/astral-sh/ruff/pull/12269, "module
global" is kind of long, and arguably redundant.
I tried just using "module" but there were too many cases where I felt
this was ambiguous. I like the way "global" works out better, though it
does require an understanding that in Python "global" generally means
"module global" not "globally global" (though in a sense module globals
are also globally global since modules are singletons).