Commit graph

19 commits

Author SHA1 Message Date
Charlie Marsh
d930052de8
Move required import parsing out of lint rule (#12536)
## Summary

Instead, make it part of the serialization and deserialization itself.
This makes it _much_ easier to reuse when solving
https://github.com/astral-sh/ruff/issues/12458.
2024-07-26 13:35:45 -04:00
Micha Reiser
2dfbf118d7
[red-knot] Extract red_knot_python_semantic crate (#11926) 2024-06-20 13:24:24 +02:00
Micha Reiser
f666d79cd7
red-knot: Symbol table (#11860) 2024-06-18 13:10:45 +00:00
Micha Reiser
26ac805e6d
red-knot: Port module resolver to salsa (#11835) 2024-06-18 12:11:58 +00:00
Micha Reiser
efbf7b14b5
red-knot[salsa part 2]: Setup semantic DB and Jar (#11837)
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
2024-06-13 08:00:51 +01:00
Alex Waygood
dfe4291c0b
Improve ruff_python_semantic::all::extract_all_names() (#11335) 2024-05-08 17:09:31 +01:00
Micha Reiser
22639c5a2a
Move all module from the AST to the semantic crate (#11330) 2024-05-08 08:56:50 +00:00
Charlie Marsh
424b8d4ad2
Use a single node hierarchy to track statements and expressions (#6709)
## Summary

This PR is a follow-up to the suggestion in
https://github.com/astral-sh/ruff/pull/6345#discussion_r1285470953 to
use a single stack to store all statements and expressions, rather than
using separate vectors for each, which gives us something closer to a
full-fidelity chain. (We can then generalize this concept to include all
other AST nodes too.)

This is in part made possible by the removal of the hash map from
`&Stmt` to `StatementId` (#6694), which makes it much cheaper to store
these using a single interface (since doing so no longer introduces the
requirement that we hash all expressions).

I'll follow-up with some profiling, but a few notes on how the data
requirements have changed:

- We now store a `BranchId` for every expression, not just every
statement, so that's an extra `u32`.
- We now store a single `NodeId` on every snapshot, rather than separate
`StatementId` and `ExpressionId` IDs, so that's one fewer `u32` for each
snapshot.
- We're probably doing a few more lookups in general, since any calls to
`current_statement()` etc. now have to iterate up the node hierarchy
until they identify the first statement.

## Test Plan

`cargo test`
2023-08-21 21:32:57 -04:00
Charlie Marsh
17af12e57c
Add branch detection to the semantic model (#6694)
## Summary

We have a few rules that rely on detecting whether two statements are in
different branches -- for example, different arms of an `if`-`else`.
Historically, the way this was implemented is that, given two statement
IDs, we'd find the common parent (by traversing upwards via our
`Statements` abstraction); then identify branches "manually" by matching
the parents against `try`, `if`, and `match`, and returning iterators
over the arms; then check if there's an arm for which one of the
statements is a child, and the other is not.

This has a few drawbacks:

1. First, the code is generally a bit hard to follow (Konsti mentioned
this too when working on the `ElifElseClause` refactor).

2. Second, this is the only place in the codebase where we need to go
from `&Stmt` to `StatementID` -- _everywhere_ else, we only need to go
in the _other_ direction. Supporting these lookups means we need to
maintain a mapping from `&Stmt` to `StatementID` that includes every
`&Stmt` in the program. (We _also_ end up maintaining a `depth` level
for every statement.) I'd like to get rid of these requirements to
improve efficiency, reduce complexity, and enable us to treat AST modes
more generically in the future. (When I looked at adding the `&Expr` to
our existing statement-tracking infrastructure, maintaining a hash map
with all the statements noticeably hurt performance.)

The solution implemented here instead makes branches a first-class
concept in the semantic model. Like with `Statements`, we now have a
`Branches` abstraction, where each branch points to its optional parent.
When we store statements, we store the `BranchID` alongside each
statement. When we need to detect whether two statements are in the same
branch, we just realize each statement's branch path and compare the
two. (Assuming that the two statements are in the same scope, then
they're on the same branch IFF one branch path is a subset of the other,
starting from the top.) We then add some calls to the visitor to push
and pop branches in the appropriate places, for `if`, `try`, and `match`
statements.

Note that a branch is not 1:1 with a statement; instead, each branch is
closer to a suite, but not _every_ suite is a branch. For example, each
arm in an `if`-`elif`-`else` is a branch, but the `else` in a `for` loop
is not considered a branch.

In addition to being much simpler, this should also be more efficient,
since we've shed the entire `&Stmt` hash map, plus the `depth` that we
track on `StatementWithParent` in favor of a single `Option<BranchID>`
on `StatementWithParent` plus a single vector for all branches. The
lookups should be faster too, since instead of doing a bunch of jumps
around with the hash map + repeated recursive calls to find the common
parents, we instead just do a few simple lookups in the `Branches`
vector to realize and compare the branch paths.

## Test Plan

`cargo test` -- we have a lot of coverage for this, which we inherited
from PyFlakes
2023-08-19 21:28:17 +00:00
Charlie Marsh
b21abe0a57
Use separate structs for expression and statement tracking (#6351)
## Summary

This PR fixes the performance degradation introduced in
https://github.com/astral-sh/ruff/pull/6345. Instead of using the
generic `Nodes` structs, we now use separate `Statement` and
`Expression` structs. Importantly, we can avoid tracking a bunch of
state for expressions that we need for parents: we don't need to track
reference-to-ID pointers (we just have no use-case for this -- I'd
actually like to remove this from statements too, but we need it for
branch detection right now), we don't need to track depth, etc.

In my testing, this entirely removes the regression on all-rules, and
gets us down to 2ms slower on the default rules (as a crude hyperfine
benchmark, so this is within margin of error IMO).

No behavioral changes.
2023-08-07 15:27:42 +00:00
Charlie Marsh
310abc769d
Move StarImport to its own module (#5186) 2023-06-20 13:12:46 -04:00
Charlie Marsh
86ff1febea
Re-export ruff_python_semantic members (#5094)
## Summary

This PR adds a more unified public API to `ruff_python_semantic`, so
that we don't need to do deeply nested imports all over the place.
2023-06-14 18:23:38 +00:00
Charlie Marsh
9741f788c7
Remove globals table from Scope (#4686) 2023-05-27 22:35:20 -04:00
Charlie Marsh
fcdc7bdd33
Remove separate ReferenceContext enum (#4631) 2023-05-24 15:12:38 +00:00
Charlie Marsh
8961d8eb6f
Track all read references in semantic model (#4610) 2023-05-24 14:14:27 +00:00
Charlie Marsh
19c4b7bee6
Rename ruff_python_semantic's Context struct to SemanticModel (#4565) 2023-05-22 02:35:03 +00:00
Charlie Marsh
72e0ffc1ac
Delay computation of Definition visibility (#4339) 2023-05-11 17:14:29 +00:00
Charlie Marsh
c1f0661225
Replace parents statement stack with a Nodes abstraction (#4233) 2023-05-06 16:12:41 +00:00
Charlie Marsh
d919adc13c
Introduce a ruff_python_semantic crate (#3865) 2023-04-04 16:50:47 +00:00