## Summary
Instead, we set an `is_star` flag on `Stmt::Try`. This is similar to the
pattern we've migrated towards for `Stmt::For` (removing
`Stmt::AsyncFor`) and friends. While these are significant differences
for an interpreter, we tend to handle these cases identically or nearly
identically.
## Test Plan
`cargo test`
## Summary
This method is almost never what you actually want, because it doesn't
respect Python's scoping semantics. For example, if you call this within
a class method, it will return class attributes, whereas Python actually
_skips_ symbols in classes unless the load occurs within the class
itself. I also want to move away from these kinds of dynamic lookups and
more towards `resolve_name`, which performs a lookup based on the stored
`BindingId` at the time of symbol resolution, and will make it much
easier for us to separate model building from linting in the near
future.
## Test Plan
`cargo test`
## Summary
Fixes some TODOs introduced in
https://github.com/astral-sh/ruff/pull/6538. In short, given an
expression like `1 if x > 0 else "Hello, world!"`, we now return a union
type that says the expression can resolve to either an `int` or a `str`.
The system remains very limited, it only works for obvious primitive
types, and there's no attempt to do inference on any more complex
variables. (If any expression yields `Unknown` or `TypeError`, we
propagate that result throughout and abort on the client's end.)
## Summary
Our `is_builtin` check did a naive walk over the parent scopes; instead,
it needs to (e.g.) skip symbols in a class scope if being called outside
of the class scope itself.
Closes https://github.com/astral-sh/ruff/issues/6466.
## Test Plan
`cargo test`
## Summary
We have some logic in the expression analyzer method to avoid
re-checking the inner `Union` in `Union[Union[...]]`, since the methods
that analyze `Union` expressions already recurse. Elsewhere, we have
logic to avoid re-checking the inner `|` in `int | (int | str)`, for the
same reason.
This PR unifies that logic into a single method _and_ ensures that, just
as we recurse over both `Union` and `|`, we also detect that we're in
_either_ kind of nested union.
Closes https://github.com/astral-sh/ruff/issues/6285.
## Test Plan
Added some new snapshots.
## Summary
This PR leverages the unified function definition node to add precise
AST node types to `MemberKind`, which is used to power our docstring
definition tracking (e.g., classes and functions, whether they're
methods or functions or nested functions and so on, whether they have a
docstring, etc.). It was painful to do this in the past because the
function variants needed to support a union anyway, but storing precise
nodes removes like a dozen panics.
No behavior changes -- purely a refactor.
## Test Plan
`cargo test`
## Summary
Per the suggestion in
https://github.com/astral-sh/ruff/discussions/6183, this PR removes
`AsyncWith`, `AsyncFor`, and `AsyncFunctionDef`, replacing them with an
`is_async` field on the non-async variants of those structs. Unlike an
interpreter, we _generally_ have identical handling for these nodes, so
separating them into distinct variants adds complexity from which we
don't really benefit. This can be seen below, where we get to remove a
_ton_ of code related to adding generic `Any*` wrappers, and a ton of
duplicate branches for these cases.
## Test Plan
`cargo test` is unchanged, apart from parser snapshots.
## Summary
See discussion in
https://github.com/astral-sh/ruff/pull/6351#discussion_r1284996979. We
can remove `RefEquality` entirely and instead use a text offset for
statement keys, since no two statements can start at the same text
offset.
## Test Plan
`cargo test`
## Summary
This PR fixes the performance degradation introduced in
https://github.com/astral-sh/ruff/pull/6345. Instead of using the
generic `Nodes` structs, we now use separate `Statement` and
`Expression` structs. Importantly, we can avoid tracking a bunch of
state for expressions that we need for parents: we don't need to track
reference-to-ID pointers (we just have no use-case for this -- I'd
actually like to remove this from statements too, but we need it for
branch detection right now), we don't need to track depth, etc.
In my testing, this entirely removes the regression on all-rules, and
gets us down to 2ms slower on the default rules (as a crude hyperfine
benchmark, so this is within margin of error IMO).
No behavioral changes.
## Summary
This PR attempts to draw a clearer divide between "methods that take
(e.g.) an expression or statement as input" and "methods that rely on
the _current_ expression or statement" in the semantic model, by
renaming methods like `stmt()` to `current_statement()`.
This had led to confusion in the past. For example, prior to this PR, we
had `scope()` (which returns the current scope), and `parent_scope`,
which returns the parent _of a scope that's passed in_. Now, the API is
clearer: `current_scope` returns the current scope, and `parent_scope`
takes a scope as argument and returns its parent.
Per above, I also changed `stmt` to `statement` and `expr` to
`expression`.
## Summary
When we iterate over the AST for analysis, we often process nodes in a
"deferred" manner. For example, if we're analyzing a function, we push
the function body onto a deferred stack, along with a snapshot of the
current semantic model state. Later, when we analyze the body, we
restore the semantic model state from the snapshot. This ensures that we
know the correct scope, hierarchy of statement parents, etc., when we go
to analyze the function body.
Historically, we _haven't_ included the _expression_ hierarchy in the
model snapshot -- so we track the current expression parents in the
visitor, but we never save and restore them when processing deferred
nodes. This can lead to subtle bugs, in that methods like
`expr_parent()` aren't guaranteed to be correct, if you're in a deferred
visitor.
This PR migrates expression tracking to mirror statement tracking
exactly. So we push all expressions onto an `IndexVec`, and include the
current expression on the snapshot. This ensures that `expr_parent()`
and related methods are "always correct" rather than "sometimes
correct".
There's a performance cost here, both at runtime and in terms of memory
consumption (we now store an additional pointer for every expression).
In my hyperfine testing, it's about a 1% performance decrease for
all-rules on CPython (up to 533.8ms, from 528.3ms) and a 4% performance
decrease for default-rules on CPython (up to 212ms, from 204ms).
However... I think this is worth it given the incorrectness of our
current approach. In the future, we may want to reconsider how we do
these upward traversals (e.g., with something like a red-green tree).
(**Note**: in https://github.com/astral-sh/ruff/pull/6351, the slowdown
seems to be entirely removed.)
## Summary
Historically, we've stored "qualified names" on our
`BindingKind::Import`, `BindingKind::SubmoduleImport`, and
`BindingKind::ImportFrom` structs. In Ruff, a "qualified name" is a
dot-separated path to a symbol. For example, given `import foo.bar`, the
"qualified name" would be `"foo.bar"`; and given `from foo.bar import
baz`, the "qualified name" would be `foo.bar.baz`.
This PR modifies the `BindingKind` structs to instead store _call paths_
rather than qualified names. So in the examples above, we'd store
`["foo", "bar"]` and `["foo", "bar", "baz"]`. It turns out that this
more efficient given our data access patterns. Namely, we frequently
need to convert the qualified name to a call path (whenever we call
`resolve_call_path`), and it turns out that we do this operation enough
that those conversations show up on benchmarks.
There are a few other advantages to using call paths, rather than
qualified names:
1. The size of `BindingKind` is reduced from 32 to 24 bytes, since we no
longer need to store a `String` (only a boxed slice).
2. All three import types are more consistent, since they now all store
a boxed slice, rather than some storing an `&str` and some storing a
`String` (for `BindingKind::ImportFrom`, we needed to allocate a
`String` to create the qualified name, but the call path is a slice of
static elements that don't require that allocation).
3. A lot of code gets simpler, in part because we now do call path
resolution "earlier". Most notably, for relative imports (`from .foo
import bar`), we store the _resolved_ call path rather than the relative
call path, so the semantic model doesn't have to deal with that
resolution. (See that `resolve_call_path` is simpler, fewer branches,
etc.)
In my testing, this change improves the all-rules benchmark by another
4-5% on top of the improvements mentioned in #6047.
## Summary
This PR removes a now-unnecessary abstraction from `helper.rs`
(`CallArguments`), in favor of adding methods to `Arguments` directly,
which helps with discoverability.
## Summary
This PR adds a new `Arguments` AST node, which we can use for function
calls and class definitions.
The `Arguments` node spans from the left (open) to right (close)
parentheses inclusive.
In the case of classes, the `Arguments` is an option, to differentiate
between:
```python
# None
class C: ...
# Some, with empty vectors
class C(): ...
```
In this PR, we don't really leverage this change (except that a few
rules get much simpler, since we don't need to lex to find the start and
end ranges of the parentheses, e.g.,
`crates/ruff/src/rules/pyupgrade/rules/lru_cache_without_parameters.rs`,
`crates/ruff/src/rules/pyupgrade/rules/unnecessary_class_parentheses.rs`).
In future PRs, this will be especially helpful for the formatter, since
we can track comments enclosed on the node itself.
## Test Plan
`cargo test`
## Summary
We have some code to ensure that if an aliased import is used, any
submodules should be marked as used too. This comment says it best:
```rust
// If the name of a submodule import is the same as an alias of another import, and the
// alias is used, then the submodule import should be marked as used too.
//
// For example, mark `pyarrow.csv` as used in:
//
// ```python
// import pyarrow as pa
// import pyarrow.csv
// print(pa.csv.read_csv("test.csv"))
// ```
```
However, it looks like when we go to look up `pyarrow` (of `import
pyarrow as pa`), we aren't checking to ensure the resolved binding is
_actually_ an import. This was causing us to attribute `print(rm.ANY)`
to `def requests_mock` here:
```python
import requests_mock as rm
def requests_mock(requests_mock: rm.Mocker):
print(rm.ANY)
```
Closes https://github.com/astral-sh/ruff/issues/6180.
Requires https://github.com/astral-sh/RustPython-Parser/pull/42
Related https://github.com/PyCQA/pyflakes/pull/778
[PEP-695](https://peps.python.org/pep-0695)
Part of #5062
<!--
Thank you for contributing to Ruff! To help us out with reviewing,
please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
<!-- What's the purpose of the change? What does it do, and why? -->
Adds a scope for type parameters, a type parameter binding kind, and
checker visitation of type parameters in type alias statements, function
definitions, and class definitions.
A few changes were necessary to ensure correctness following the
insertion of a new scope between function and class scopes and their
parent.
## Test Plan
<!-- How was it tested? -->
Undefined name snapshots.
Unused type parameter rule will be added as follow-up.
## Summary
This PR stores the mapping from `ExprName` node to resolved `BindingId`,
which lets us skip scope lookups in `resolve_call_path`. It's enabled by
#6045, since that PR ensures that when we analyze a node (and thus call
`resolve_call_path`), we'll have already visited its `ExprName`
elements.
In more detail: imagine that we're traversing over `foo.bar()`. When we
read `foo`, it will be an `ExprName`, which we'll then resolve to a
binding via `handle_node_load`. With this change, we then store that
binding in a map. Later, if we call `collect_call_path` on `foo.bar`,
we'll identify `foo` (the "head" of the attribute) and grab the resolved
binding in that map. _Almost_ all names are now resolved in advance,
though it's not a strict requirement, and some rules break that pattern
(e.g., if we're analyzing arguments, and they need to inspect their
annotations, which are visited in a deferred manner).
This improves performance by 4-6% on the all-rules benchmark. It looks
like it hurts performance (1-2% drop) in the default-rules benchmark,
presumedly because those rules don't call `resolve_call_path` nearly as
much, and so we're paying for these extra writes.
Here's the benchmark data:
```
linter/default-rules/numpy/globals.py
time: [67.270 µs 67.380 µs 67.489 µs]
thrpt: [43.720 MiB/s 43.792 MiB/s 43.863 MiB/s]
change:
time: [+0.4747% +0.7752% +1.0626%] (p = 0.00 < 0.05)
thrpt: [-1.0514% -0.7693% -0.4724%]
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
linter/default-rules/pydantic/types.py
time: [1.4067 ms 1.4105 ms 1.4146 ms]
thrpt: [18.028 MiB/s 18.081 MiB/s 18.129 MiB/s]
change:
time: [+1.3152% +1.6953% +2.0414%] (p = 0.00 < 0.05)
thrpt: [-2.0006% -1.6671% -1.2981%]
Performance has regressed.
linter/default-rules/numpy/ctypeslib.py
time: [637.67 µs 638.96 µs 640.28 µs]
thrpt: [26.006 MiB/s 26.060 MiB/s 26.113 MiB/s]
change:
time: [+1.5859% +1.8109% +2.0353%] (p = 0.00 < 0.05)
thrpt: [-1.9947% -1.7787% -1.5611%]
Performance has regressed.
linter/default-rules/large/dataset.py
time: [3.2289 ms 3.2336 ms 3.2383 ms]
thrpt: [12.563 MiB/s 12.581 MiB/s 12.599 MiB/s]
change:
time: [+0.8029% +0.9898% +1.1740%] (p = 0.00 < 0.05)
thrpt: [-1.1604% -0.9801% -0.7965%]
Change within noise threshold.
linter/all-rules/numpy/globals.py
time: [134.05 µs 134.15 µs 134.26 µs]
thrpt: [21.977 MiB/s 21.995 MiB/s 22.012 MiB/s]
change:
time: [-4.4571% -4.1175% -3.8268%] (p = 0.00 < 0.05)
thrpt: [+3.9791% +4.2943% +4.6651%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) low mild
3 (3.00%) high mild
3 (3.00%) high severe
linter/all-rules/pydantic/types.py
time: [2.5627 ms 2.5669 ms 2.5720 ms]
thrpt: [9.9158 MiB/s 9.9354 MiB/s 9.9516 MiB/s]
change:
time: [-5.8304% -5.6374% -5.4452%] (p = 0.00 < 0.05)
thrpt: [+5.7587% +5.9742% +6.1914%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
6 (6.00%) high mild
1 (1.00%) high severe
linter/all-rules/numpy/ctypeslib.py
time: [1.3949 ms 1.3956 ms 1.3964 ms]
thrpt: [11.925 MiB/s 11.931 MiB/s 11.937 MiB/s]
change:
time: [-6.2496% -6.0856% -5.9293%] (p = 0.00 < 0.05)
thrpt: [+6.3030% +6.4799% +6.6662%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe
linter/all-rules/large/dataset.py
time: [5.5951 ms 5.6019 ms 5.6093 ms]
thrpt: [7.2527 MiB/s 7.2623 MiB/s 7.2711 MiB/s]
change:
time: [-5.1781% -4.9783% -4.8070%] (p = 0.00 < 0.05)
thrpt: [+5.0497% +5.2391% +5.4608%]
Performance has improved.
```
Still playing with this (the concepts need better names, documentation,
etc.), but opening up for feedback.
## Summary
Check for unused private `TypeVar`. See [original
implementation](2a86db8271/pyi.py (L1958)).
```
$ flake8 --select Y018 crates/ruff/resources/test/fixtures/flake8_pyi/PYI018.pyi
crates/ruff/resources/test/fixtures/flake8_pyi/PYI018.pyi:4:1: Y018 TypeVar "_T" is not used
crates/ruff/resources/test/fixtures/flake8_pyi/PYI018.pyi:5:1: Y018 TypeVar "_P" is not used
```
```
$ ./target/debug/ruff --select PYI018 crates/ruff/resources/test/fixtures/flake8_pyi/PYI018.pyi --no-cache
crates/ruff/resources/test/fixtures/flake8_pyi/PYI018.pyi:4:1: PYI018 TypeVar `_T` is never used
crates/ruff/resources/test/fixtures/flake8_pyi/PYI018.pyi:5:1: PYI018 TypeVar `_P` is never used
Found 2 errors.
```
In the file `unused_private_type_declaration.rs`, I'm planning to add
other rules that are similar to `PYI018` like the `PYI046`, `PYI047` and
`PYI049`.
ref #848
## Test Plan
Snapshots and manual runs of flake8.
## Summary
This PR adds a `logger-objects` setting that allows users to mark
specific symbols a `logging.Logger` objects. Currently, if a `logger` is
imported, we only flagged it as a `logging.Logger` if it comes exactly
from the `logging` module or is `flask.current_app.logger`.
This PR allows users to mark specific loggers, like
`logging_setup.logger`, to ensure that they're covered by the
`flake8-logging-format` rules and others.
For example, if you have a module `logging_setup.py` with the following
contents:
```python
import logging
logger = logging.getLogger(__name__)
```
Adding `"logging_setup.logger"` to `logger-objects` will ensure that
`logging_setup.logger` is treated as a `logging.Logger` object when
imported from other modules (e.g., `from logging_setup import logger`).
Closes https://github.com/astral-sh/ruff/issues/5694.
## Summary
This is equivalent for a single flag, but I think it's more likely to be
correct when the bitflags are modified -- the primary reason being that
we sometimes define flags as the union of other flags, e.g.:
```rust
const ANNOTATION = Self::TYPING_ONLY_ANNOTATION.bits() | Self::RUNTIME_ANNOTATION.bits();
```
In this case, `flags.contains(Flag::ANNOTATION)` requires that _both_
flags in the union are set, whereas `flags.intersects(Flag::ANNOTATION)`
requires that _at least one_ flag is set.
## Summary
As part of my continued quest to separate semantic model-building from
diagnostic emission, this PR moves our unresolved-reference rules to a
deferred pass. So, rather than emitting diagnostics as we encounter
unresolved references, we now track those unresolved references on the
semantic model (just like resolved references), and after traversal,
emit the relevant rules for any unresolved references.
## Summary
This PR moves two rules (`invalid-all-format` and `invalid-all-object`)
out of the name-binding phase, and into the dedicated pass over all
bindings that occurs at the end of the `Checker`. This is part of my
continued quest to separate the semantic model-building logic from the
actual rule enforcement.
## Summary
The vector of names here is immutable -- we never push to it after
initialization. Boxing reduces the size of the variant from 32 bytes to
24 bytes. (See:
https://nnethercote.github.io/perf-book/type-sizes.html#boxed-slices.)
It doesn't make a difference here, since it's not the largest variant,
but it still seems like a prudent change (and I was considering adding
another field to this variant, though I may no longer do so).
## Summary
No behavior change, but this is in theory more efficient, since we can
just iterate over the flat `Binding` vector rather than having to
iterate over binding chains via the `Scope`.
## Summary
This PR moves the "unused exception" rule out of the visitor and into a
deferred check. When we can base rules solely on the semantic model, we
probably should, as it greatly simplifies the `Checker` itself.