mirrors/ruff - Forgejo: Beyond coding. We Forge.

mirror of https://github.com/astral-sh/ruff.git synced 2025-07-19 19:15:30 +00:00

Author	SHA1	Message	Date
Charlie Marsh	d685107638	Move {AnyNodeRef, AstNode} to ruff_python_ast crate root (#8030 ) This is a do-over of https://github.com/astral-sh/ruff/pull/8011, which I accidentally merged into a non-`main` branch. Sorry!	2023-10-18 00:01:18 +00:00
Tom Kuson	62f1ee08e7	[`refurb`] Implement `single-item-membership-test` (`FURB171`) (#7815 ) ## Summary Implement [`no-single-item-in`](https://github.com/dosisod/refurb/blob/master/refurb/checks/iterable/no_single_item_in.py) as `single-item-membership-test` (`FURB171`). Uses the helper function `generate_comparison` from the `pycodestyle` implementations; this function should probably be moved, but I am not sure where at the moment. Update: moved it to `ruff_python_ast::helpers`. Related to #1348. ## Test Plan `cargo test`	2023-10-08 14:08:47 +00:00
Charlie Marsh	93b5d8a0fb	Implement our own small-integer optimization (#7584 ) ## Summary This is a follow-up to #7469 that attempts to achieve similar gains, but without introducing malachite. Instead, this PR removes the `BigInt` type altogether, instead opting for a simple enum that allows us to store small integers directly and only allocate for values greater than `i64`: ```rust /// A Python integer literal. Represents both small (fits in an `i64`) and large integers. #[derive(Clone, PartialEq, Eq, Hash)] pub struct Int(Number); #[derive(Debug, Clone, PartialEq, Eq, Hash)] pub enum Number { /// A "small" number that can be represented as an `i64`. Small(i64), /// A "large" number that cannot be represented as an `i64`. Big(Box<str>), } impl std::fmt::Display for Number { fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { match self { Number::Small(value) => write!(f, "{value}"), Number::Big(value) => write!(f, "{value}"), } } } ``` We typically don't care about numbers greater than `isize` -- our only uses are comparisons against small constants (like `1`, `2`, `3`, etc.), so there's no real loss of information, except in one or two rules where we're now a little more conservative (with the worst-case being that we don't flag, e.g., an `itertools.pairwise` that uses an extremely large value for the slice start constant). For simplicity, a few diagnostics now show a dedicated message when they see integers that are out of the supported range (e.g., `outdated-version-block`). An additional benefit here is that we get to remove a few dependencies, especially `num-bigint`. ## Test Plan `cargo test`	2023-09-25 15:13:21 +00:00
konsti	56440ad835	Introduce `ArgOrKeyword` to keep call parameter order (#7302 ) ## Motivation The `ast::Arguments` for call argument are split into positional arguments (args) and keywords arguments (keywords). We currently assume that call consists of first args and then keywords, which is generally the case, but not always: ```python f(args, a=2, args2, *kwargs) class A(args, a=2, args2, *kwargs): pass ``` The consequence is accidentally reordering arguments (https://github.com/astral-sh/ruff/pull/7268). ## Summary `Arguments::args_and_keywords` returns an iterator of an `ArgOrKeyword` enum that yields args and keywords in the correct order. I've fixed the obvious `args` and `keywords` usages, but there might be some cases with wrong assumptions remaining. ## Test Plan The generator got new test cases, otherwise the stacked PR (https://github.com/astral-sh/ruff/pull/7268) which uncovered this.	2023-09-13 08:45:46 +00:00
Dhruv Manilawala	04f2842e4f	Move `ExprConstant::kind` to `StringConstant::unicode` (#7180 )	2023-09-06 07:39:25 +00:00
Charlie Marsh	b0d171ac19	Supported starred exceptions in length-one tuple detection (#7080 )	2023-09-03 13:31:13 +00:00
Charlie Marsh	fc89976c24	Move `Ranged` into `ruff_text_size` (#6919 ) ## Summary The motivation here is that this enables us to implement `Ranged` in crates that don't depend on `ruff_python_ast`. Largely a mechanical refactor with a lot of regex, Clippy help, and manual fixups. ## Test Plan `cargo test`	2023-08-27 14:12:51 -04:00
Micha Reiser	7c480236e0	Use dyn dispatch for `any_over_*` (#6912 )	2023-08-27 15:54:01 +02:00
Charlie Marsh	15b73bdb8a	Introduce AST nodes for `PatternMatchClass` arguments (#6881 ) ## Summary This PR introduces two new AST nodes to improve the representation of `PatternMatchClass`. As a reminder, `PatternMatchClass` looks like this: ```python case Point2D(0, 0, x=1, y=2): ... ``` Historically, this was represented as a vector of patterns (for the `0, 0` portion) and parallel vectors of keyword names (for `x` and `y`) and values (for `1` and `2`). This introduces a bunch of challenges for the formatter, but importantly, it's also really different from how we represent similar nodes, like arguments (`func(0, 0, x=1, y=2)`) or parameters (`def func(x, y)`). So, firstly, we now use a single node (`PatternArguments`) for the entire parenthesized region, making it much more consistent with our other nodes. So, above, `PatternArguments` would be `(0, 0, x=1, y=2)`. Secondly, we now have a `PatternKeyword` node for `x=1` and `y=2`. This is much more similar to the how `Keyword` is represented within `Arguments` for call expressions. Closes https://github.com/astral-sh/ruff/issues/6866. Closes https://github.com/astral-sh/ruff/issues/6880.	2023-08-26 14:45:44 +00:00
Charlie Marsh	96d310fbab	Remove `Stmt::TryStar` (#6566 ) ## Summary Instead, we set an `is_star` flag on `Stmt::Try`. This is similar to the pattern we've migrated towards for `Stmt::For` (removing `Stmt::AsyncFor`) and friends. While these are significant differences for an interpreter, we tend to handle these cases identically or nearly identically. ## Test Plan `cargo test`	2023-08-14 13:39:44 -04:00
Charlie Marsh	f16e780e0a	Add an implicit concatenation flag to string and bytes constants (#6512 ) ## Summary Per the discussion in https://github.com/astral-sh/ruff/discussions/6183, this PR adds an `implicit_concatenated` flag to the string and bytes constant variants. It's not actually _used_ anywhere as of this PR, but it is covered by the tests. Specifically, we now use a struct for the string and bytes cases, along with the `Expr::FString` node. That struct holds the value, plus the flag: ```rust #[derive(Clone, Debug, PartialEq, is_macro::Is)] pub enum Constant { Str(StringConstant), Bytes(BytesConstant), ... } #[derive(Clone, Debug, PartialEq, Eq)] pub struct StringConstant { /// The string value as resolved by the parser (i.e., without quotes, or escape sequences, or /// implicit concatenations). pub value: String, /// Whether the string contains multiple string tokens that were implicitly concatenated. pub implicit_concatenated: bool, } impl Deref for StringConstant { type Target = str; fn deref(&self) -> &Self::Target { self.value.as_str() } } #[derive(Clone, Debug, PartialEq, Eq)] pub struct BytesConstant { /// The bytes value as resolved by the parser (i.e., without quotes, or escape sequences, or /// implicit concatenations). pub value: Vec<u8>, /// Whether the string contains multiple string tokens that were implicitly concatenated. pub implicit_concatenated: bool, } impl Deref for BytesConstant { type Target = [u8]; fn deref(&self) -> &Self::Target { self.value.as_slice() } } ``` ## Test Plan `cargo test`	2023-08-14 13:46:54 +00:00
Dhruv Manilawala	6a64f2289b	Rename `Magic` to `IpyEscape` (#6395 ) ## Summary This PR renames the `MagicCommand` token to `IpyEscapeCommand` token and `MagicKind` to `IpyEscapeKind` type to better reflect the purpose of the token and type. Similarly, it renames the AST nodes from `LineMagic` to `IpyEscapeCommand` prefixed with `Stmt`/`Expr` wherever necessary. It also makes renames from using `jupyter_magic` to `ipython_escape_commands` in various function names. The mode value is still `Mode::Jupyter` because the escape commands are part of the IPython syntax but the lexing/parsing is done for a Jupyter notebook. ### Motivation behind the rename: * IPython codebase defines it as "EscapeCommand" / "Escape Sequences": * Escape Sequences: `292e3a2345/IPython/core/inputtransformer2.py (L329-L333)` * Escape command: `292e3a2345/IPython/core/inputtransformer2.py (L410-L411)` * The word "magic" is used mainly for the actual magic commands i.e., the ones starting with `%`/`%%` (https://ipython.readthedocs.io/en/stable/interactive/reference.html#magic-command-system). So, this avoids any confusion between the Magic token (`%`, `%%`) and the escape command itself. ## Test Plan * `cargo test` to make sure all renames are done correctly. * `grep` for `jupyter_escape`/`magic` to make sure all renames are done correctly.	2023-08-09 13:28:18 +00:00
Charlie Marsh	3f0eea6d87	Rename `JoinedStr` to `FString` in the AST (#6379 ) ## Summary Per the proposal in https://github.com/astral-sh/ruff/discussions/6183, this PR renames the `JoinedStr` node to `FString`.	2023-08-07 17:33:17 +00:00
Charlie Marsh	daefa74e9a	Remove async AST node variants for `with`, `for`, and `def` (#6369 ) ## Summary Per the suggestion in https://github.com/astral-sh/ruff/discussions/6183, this PR removes `AsyncWith`, `AsyncFor`, and `AsyncFunctionDef`, replacing them with an `is_async` field on the non-async variants of those structs. Unlike an interpreter, we _generally_ have identical handling for these nodes, so separating them into distinct variants adds complexity from which we don't really benefit. This can be seen below, where we get to remove a _ton_ of code related to adding generic `Any*` wrappers, and a ton of duplicate branches for these cases. ## Test Plan `cargo test` is unchanged, apart from parser snapshots.	2023-08-07 16:36:02 +00:00
Charlie Marsh	76148ddb76	Store call paths rather than stringified names (#6102 ) ## Summary Historically, we've stored "qualified names" on our `BindingKind::Import`, `BindingKind::SubmoduleImport`, and `BindingKind::ImportFrom` structs. In Ruff, a "qualified name" is a dot-separated path to a symbol. For example, given `import foo.bar`, the "qualified name" would be `"foo.bar"`; and given `from foo.bar import baz`, the "qualified name" would be `foo.bar.baz`. This PR modifies the `BindingKind` structs to instead store _call paths_ rather than qualified names. So in the examples above, we'd store `["foo", "bar"]` and `["foo", "bar", "baz"]`. It turns out that this more efficient given our data access patterns. Namely, we frequently need to convert the qualified name to a call path (whenever we call `resolve_call_path`), and it turns out that we do this operation enough that those conversations show up on benchmarks. There are a few other advantages to using call paths, rather than qualified names: 1. The size of `BindingKind` is reduced from 32 to 24 bytes, since we no longer need to store a `String` (only a boxed slice). 2. All three import types are more consistent, since they now all store a boxed slice, rather than some storing an `&str` and some storing a `String` (for `BindingKind::ImportFrom`, we needed to allocate a `String` to create the qualified name, but the call path is a slice of static elements that don't require that allocation). 3. A lot of code gets simpler, in part because we now do call path resolution "earlier". Most notably, for relative imports (`from .foo import bar`), we store the _resolved_ call path rather than the relative call path, so the semantic model doesn't have to deal with that resolution. (See that `resolve_call_path` is simpler, fewer branches, etc.) In my testing, this change improves the all-rules benchmark by another 4-5% on top of the improvements mentioned in #6047.	2023-08-05 15:21:50 +00:00
Dhruv Manilawala	1ac2699b5e	Update `F841` autofix to not remove line magic expr (#6141 ) ## Summary Update `F841` autofix to not remove line magic expr ## Test Plan Added test case for assignment statement with and without type annotation fixes: #6116	2023-08-05 00:45:01 +00:00
Charlie Marsh	9f3567dea6	Use `range: _` in lieu of `range: _range` (#6296 ) ## Summary `range: _range` is slightly inconvenient because you can't use it multiple times within a single match, unlike `_`.	2023-08-02 22:11:13 -04:00
Charlie Marsh	23b8fc4366	Move `includes_arg_name` onto `Parameters` (#6282 ) ## Summary Like #6279, no reason for this to be a standalone method.	2023-08-02 18:05:26 +00:00
Charlie Marsh	fd40864924	Move `find_keyword` helpers onto `Arguments` struct (#6280 ) ## Summary Similar to #6279, moving some helpers onto the struct in the name of reducing the number of random undiscoverable utilities we have in `helpers.rs`. Most of the churn is migrating rules to take `ast::ExprCall` instead of the spread call arguments. ## Test Plan `cargo test`	2023-08-02 13:54:48 -04:00
Charlie Marsh	041946fb64	Remove `CallArguments` abstraction (#6279 ) ## Summary This PR removes a now-unnecessary abstraction from `helper.rs` (`CallArguments`), in favor of adding methods to `Arguments` directly, which helps with discoverability.	2023-08-02 13:25:43 -04:00
Charlie Marsh	8a0f844642	Box type params and arguments fields on the class definition node (#6275 ) ## Summary This PR boxes the `TypeParams` and `Arguments` fields on the class definition node. These fields are optional and often emitted, and given that class definition is our largest enum variant, we pay the cost of including them for every statement in the AST. Boxing these types reduces the statement size by 40 bytes, which seems like a good tradeoff given how infrequently these are accessed. ## Test Plan Need to benchmark, but no behavior changes.	2023-08-02 16:47:06 +00:00
Charlie Marsh	b095b7204b	Add a `TypeParams` node to the AST (#6261 ) ## Summary Similar to #6259, this PR adds a `TypeParams` node to the AST, to capture the list of type parameters with their surrounding brackets. If a statement lacks type parameters, the `type_params` field will be `None`.	2023-08-02 14:12:45 +00:00
Charlie Marsh	981e64f82b	Introduce an `Arguments` AST node for function calls and class definitions (#6259 ) ## Summary This PR adds a new `Arguments` AST node, which we can use for function calls and class definitions. The `Arguments` node spans from the left (open) to right (close) parentheses inclusive. In the case of classes, the `Arguments` is an option, to differentiate between: ```python # None class C: ... # Some, with empty vectors class C(): ... ``` In this PR, we don't really leverage this change (except that a few rules get much simpler, since we don't need to lex to find the start and end ranges of the parentheses, e.g., `crates/ruff/src/rules/pyupgrade/rules/lru_cache_without_parameters.rs`, `crates/ruff/src/rules/pyupgrade/rules/unnecessary_class_parentheses.rs`). In future PRs, this will be especially helpful for the formatter, since we can track comments enclosed on the node itself. ## Test Plan `cargo test`	2023-08-02 10:01:13 -04:00
Charlie Marsh	9c708d8fc1	Rename `Parameter#arg` and `ParameterWithDefault#def` fields (#6255 ) ## Summary This PR renames... - `Parameter#arg` to `Parameter#name` - `ParameterWithDefault#def` to `ParameterWithDefault#parameter` (such that `ParameterWithDefault` has a `default` and a `parameter`) ## Test Plan `cargo test`	2023-08-01 14:28:34 -04:00
Charlie Marsh	adc8bb7821	Rename `Arguments` to `Parameters` in the AST (#6253 ) ## Summary This PR renames a few AST nodes for clarity: - `Arguments` is now `Parameters` - `Arg` is now `Parameter` - `ArgWithDefault` is now `ParameterWithDefault` For now, the attribute names that reference `Parameters` directly are changed (e.g., on `StmtFunctionDef`), but the attributes on `Parameters` itself are not (e.g., `vararg`). We may revisit that decision in the future. For context, the AST node formerly known as `Arguments` is used in function definitions. Formally (outside of the Python context), "arguments" typically refers to "the values passed to a function", while "parameters" typically refers to "the variables used in a function definition". E.g., if you Google "arguments vs parameters", you'll get some explanation like: > A parameter is a variable in a function definition. It is a placeholder and hence does not have a concrete value. An argument is a value passed during function invocation. We're thus deviating from Python's nomenclature in favor of a scheme that we find to be more precise.	2023-08-01 13:53:28 -04:00
konsti	1df7e9831b	Replace `.map_or(false, $closure)` with `.is_some_and(closure)` (#6244 ) Summary [Option::is_some_and](https://doc.rust-lang.org/stable/std/option/enum.Option.html#method.is_some_and) and [Result::is_ok_and](https://doc.rust-lang.org/std/result/enum.Result.html#method.is_ok_and) are new methods is rust 1.70. I find them way more readable than `.map_or(false, ...)`. The changes are `s/.map_or(false,/.is_some_and(/g`, then manually switching to `is_ok_and` where the value is a Result rather than an Option. Test Plan n/a^	2023-08-01 19:29:42 +02:00
Micha Reiser	40f54375cb	Pull in RustPython parser (#6099 )	2023-07-27 09:29:11 +00:00
Micha Reiser	2cf00fee96	Remove parser dependency from ruff-python-ast (#6096 )	2023-07-26 17:47:22 +02:00
Dhruv Manilawala	025fa4eba8	Integrate the new Jupyter AST nodes in Ruff (#6086 ) ## Summary This PR adds the implementation for the new Jupyter AST nodes i.e., `ExprLineMagic` and `StmtLineMagic`. ## Test Plan Add test cases for `unparse` containing magic commands resolves: #6087	2023-07-26 08:20:30 +00:00
Harutaka Kawamura	62f821daaa	Avoid raising PT012 for simple `with` statements (#6081 )	2023-07-26 01:43:31 +00:00
Charlie Marsh	0d94337b96	Avoid allocations in `SimpleCallArgs` (#6021 ) ## Summary My intuition is that it's faster to do these checks as-needed rather than allocation new hash maps and vectors for the arguments. (We typically only query once anyway.)	2023-07-24 04:55:37 +00:00
Zanie Blue	b27f0fa433	Implement `any_over_expr` for type alias and type params (#5866 ) Part of https://github.com/astral-sh/ruff/issues/5062	2023-07-19 16:17:06 -05:00
Charlie Marsh	5f3da9955a	Rename `ruff_python_whitespace` to `ruff_python_trivia` (#5886 ) ## Summary This crate now contains utilities for dealing with trivia more broadly: whitespace, newlines, "simple" trivia lexing, etc. So renaming it to reflect its increased responsibilities. To avoid conflicts, I've also renamed `Token` and `TokenKind` to `SimpleToken` and `SimpleTokenKind`.	2023-07-19 11:48:27 -04:00
konsti	730e6b2b4c	Refactor `StmtIf`: Formatter and Linter (#5459 ) ## Summary Previously, `StmtIf` was defined recursively as ```rust pub struct StmtIf { pub range: TextRange, pub test: Box<Expr>, pub body: Vec<Stmt>, pub orelse: Vec<Stmt>, } ``` Every `elif` was represented as an `orelse` with a single `StmtIf`. This means that this representation couldn't differentiate between ```python if cond1: x = 1 else: if cond2: x = 2 ``` and ```python if cond1: x = 1 elif cond2: x = 2 ``` It also makes many checks harder than they need to be because we have to recurse just to iterate over an entire if-elif-else and because we're lacking nodes and ranges on the `elif` and `else` branches. We change the representation to a flat ```rust pub struct StmtIf { pub range: TextRange, pub test: Box<Expr>, pub body: Vec<Stmt>, pub elif_else_clauses: Vec<ElifElseClause>, } pub struct ElifElseClause { pub range: TextRange, pub test: Option<Expr>, pub body: Vec<Stmt>, } ``` where `test: Some(_)` represents an `elif` and `test: None` an else. This representation is different tradeoff, e.g. we need to allocate the `Vec<ElifElseClause>`, the `elif`s are now different than the `if`s (which matters in rules where want to check both `if`s and `elif`s) and the type system doesn't guarantee that the `test: None` else is actually last. We're also now a bit more inconsistent since all other `else`, those from `for`, `while` and `try`, still don't have nodes. With the new representation some things became easier, e.g. finding the `elif` token (we can use the start of the `ElifElseClause`) and formatting comments for if-elif-else (no more dangling comments splitting, we only have to insert the dangling comment after the colon manually and set `leading_alternate_branch_comments`, everything else is taken of by having nodes for each branch and the usual placement.rs fixups). ## Merge Plan This PR requires coordination between the parser repo and the main ruff repo. I've split the ruff part, into two stacked PRs which have to be merged together (only the second one fixes all tests), the first for the formatter to be reviewed by @michareiser and the second for the linter to be reviewed by @charliermarsh. * MH: Review and merge https://github.com/astral-sh/RustPython-Parser/pull/20 * MH: Review and merge or move later in stack https://github.com/astral-sh/RustPython-Parser/pull/21 * MH: Review and approve https://github.com/astral-sh/RustPython-Parser/pull/22 * MH: Review and approve formatter PR https://github.com/astral-sh/ruff/pull/5459 * CM: Review and approve linter PR https://github.com/astral-sh/ruff/pull/5460 * Merge linter PR in formatter PR, fix ecosystem checks (ecosystem checks can't run on the formatter PR and won't run on the linter PR, so we need to merge them first) * Merge https://github.com/astral-sh/RustPython-Parser/pull/22 * Create tag in the parser, update linter+formatter PR * Merge linter+formatter PR https://github.com/astral-sh/ruff/pull/5459 --------- Co-authored-by: Micha Reiser <micha@reiser.io>	2023-07-18 13:40:15 +02:00
David Szotten	52aa2fc875	upgrade rustpython to remove tuple-constants (#5840 ) c.f. https://github.com/astral-sh/RustPython-Parser/pull/28 Tests: No snapshots changed --------- Co-authored-by: Zanie <contact@zanie.dev>	2023-07-17 22:50:31 +00:00
Charlie Marsh	4782675bf9	Remove lexer-based comment range detection (#5785 ) ## Summary I'm doing some unrelated profiling, and I noticed that this method is actually measurable on the CPython benchmark -- it's > 1% of execution time. We don't need to lex here, we already know the ranges of all comments, so we can just do a simple binary search for overlap, which brings the method down to 0%. ## Test Plan `cargo test`	2023-07-16 01:03:27 +00:00
Charlie Marsh	5a4516b812	Misc. stylistic changes from flipping through rules late at night (#5757 ) ## Summary This is really bad PR hygiene, but a mix of: using `Locator`-based fixes in a few places (in lieu of `Generator`-based fixes), using match syntax to avoid `.len() == 1` checks, using common helpers in more places, etc. ## Test Plan `cargo test`	2023-07-14 05:23:47 +00:00
Charlie Marsh	dadad0e9ed	Remove some allocations in argument detection (#5481 ) ## Summary Drive-by PR to remove some allocations around argument name matching.	2023-07-03 12:21:26 -04:00
Anders Kaseorg	df13e69c3c	Format let-else with rustfmt nightly (#5461 ) Support for `let…else` formatting was just merged to nightly (rust-lang/rust#113225). Rerun `cargo fmt` with Rust nightly 2023-07-02 to pick this up. Followup to #939. Signed-off-by: Anders Kaseorg <andersk@mit.edu>	2023-07-03 02:13:35 +00:00
Charlie Marsh	8e06140d1d	Remove continuations when deleting statements (#5198 ) ## Summary This PR modifies our statement deletion logic to delete any preceding continuation lines. For example, given: ```py x = 1; \ import os ``` We'll now rewrite to: ```py x = 1; ``` In addition, the logic can now handle multiple preceding continuations (which is unlikely, but valid).	2023-06-19 22:04:28 -04:00
Charlie Marsh	36e01ad6eb	Upgrade RustPython (#5192 ) ## Summary This PR upgrade RustPython to pull in the changes to `Arguments` (zip defaults with their identifiers) and all the renames to `CmpOp` and friends.	2023-06-19 21:09:53 +00:00
Charlie Marsh	fab2a4adf7	Use `matches!` for insecure hash rule (#5141 )	2023-06-16 04:18:32 +00:00
Charlie Marsh	5ea3e42513	Always use identifier ranges to store bindings (#5110 ) ## Summary At present, when we store a binding, we include a `TextRange` alongside it. The `TextRange` _sometimes_ matches the exact range of the identifier to which the `Binding` is linked, but... not always. For example, given: ```python x = 1 ``` The binding we create _will_ use the range of `x`, because the left-hand side is an `Expr::Name`, which has a valid range on it. However, given: ```python try: pass except ValueError as e: pass ``` When we create a binding for `e`, we don't have a `TextRange`... The AST doesn't give us one. So we end up extracting it via lexing. This PR extends that pattern to the rest of the binding kinds, to ensure that whenever we create a binding, we always use the range of the bound name. This leads to better diagnostics in cases like pattern matching, whereby the diagnostic for "unused variable `x`" here used to include `x`, instead of just `x`: ```python def f(provided: int) -> int: match provided: case [_, x]: pass ``` This is _also_ required for symbol renames, since we track writes as bindings -- so we need to know the ranges of the bound symbols. By storing these bindings precisely, we can also remove the `binding.trimmed_range` abstraction -- since bindings already use the "trimmed range". To implement this behavior, I took some of our existing utilities (like the code we had for `except ValueError as e` above), migrated them from a full lexer to a zero-allocation lexer that _only_ identifies "identifiers", and moved the behavior into a trait, so we can now do `stmt.identifier(locator)` to get the range for the identifier. Honestly, we might end up discarding much of this if we decide to put ranges on all identifiers (https://github.com/astral-sh/RustPython-Parser/pull/8). But even if we do, this will _still_ be a good change, because the lexer introduced here is useful beyond names (e.g., we use it find the `except` keyword in an exception handler, to find the `else` after a `for` loop, and so on). So, I'm fine committing this even if we end up changing our minds about the right approach. Closes #5090. ## Benchmarks No significant change, with one statistically significant improvement (-2.1654% on `linter/all-rules/large/dataset.py`): ``` linter/default-rules/numpy/globals.py time: [73.922 µs 73.955 µs 73.986 µs] thrpt: [39.882 MiB/s 39.898 MiB/s 39.916 MiB/s] change: time: [-0.5579% -0.4732% -0.3980%] (p = 0.00 < 0.05) thrpt: [+0.3996% +0.4755% +0.5611%] Change within noise threshold. Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) low severe 1 (1.00%) low mild 1 (1.00%) high mild linter/default-rules/pydantic/types.py time: [1.4909 ms 1.4917 ms 1.4926 ms] thrpt: [17.087 MiB/s 17.096 MiB/s 17.106 MiB/s] change: time: [+0.2140% +0.2741% +0.3392%] (p = 0.00 < 0.05) thrpt: [-0.3380% -0.2734% -0.2136%] Change within noise threshold. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe linter/default-rules/numpy/ctypeslib.py time: [688.97 µs 691.34 µs 694.15 µs] thrpt: [23.988 MiB/s 24.085 MiB/s 24.168 MiB/s] change: time: [-1.3282% -0.7298% -0.1466%] (p = 0.02 < 0.05) thrpt: [+0.1468% +0.7351% +1.3461%] Change within noise threshold. Found 15 outliers among 100 measurements (15.00%) 1 (1.00%) low mild 2 (2.00%) high mild 12 (12.00%) high severe linter/default-rules/large/dataset.py time: [3.3872 ms 3.4032 ms 3.4191 ms] thrpt: [11.899 MiB/s 11.954 MiB/s 12.011 MiB/s] change: time: [-0.6427% -0.2635% +0.0906%] (p = 0.17 > 0.05) thrpt: [-0.0905% +0.2642% +0.6469%] No change in performance detected. Found 20 outliers among 100 measurements (20.00%) 1 (1.00%) low severe 2 (2.00%) low mild 4 (4.00%) high mild 13 (13.00%) high severe linter/all-rules/numpy/globals.py time: [148.99 µs 149.21 µs 149.42 µs] thrpt: [19.748 MiB/s 19.776 MiB/s 19.805 MiB/s] change: time: [-0.7340% -0.5068% -0.2778%] (p = 0.00 < 0.05) thrpt: [+0.2785% +0.5094% +0.7395%] Change within noise threshold. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) low mild 1 (1.00%) high severe linter/all-rules/pydantic/types.py time: [3.0362 ms 3.0396 ms 3.0441 ms] thrpt: [8.3779 MiB/s 8.3903 MiB/s 8.3997 MiB/s] change: time: [-0.0957% +0.0618% +0.2125%] (p = 0.45 > 0.05) thrpt: [-0.2121% -0.0618% +0.0958%] No change in performance detected. Found 11 outliers among 100 measurements (11.00%) 1 (1.00%) low severe 3 (3.00%) low mild 5 (5.00%) high mild 2 (2.00%) high severe linter/all-rules/numpy/ctypeslib.py time: [1.6879 ms 1.6894 ms 1.6909 ms] thrpt: [9.8478 MiB/s 9.8562 MiB/s 9.8652 MiB/s] change: time: [-0.2279% -0.0888% +0.0436%] (p = 0.18 > 0.05) thrpt: [-0.0435% +0.0889% +0.2284%] No change in performance detected. Found 5 outliers among 100 measurements (5.00%) 4 (4.00%) low mild 1 (1.00%) high severe linter/all-rules/large/dataset.py time: [7.1520 ms 7.1586 ms 7.1654 ms] thrpt: [5.6777 MiB/s 5.6831 MiB/s 5.6883 MiB/s] change: time: [-2.5626% -2.1654% -1.7780%] (p = 0.00 < 0.05) thrpt: [+1.8102% +2.2133% +2.6300%] Performance has improved. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) low mild 1 (1.00%) high mild ```	2023-06-15 18:43:19 +00:00
Charlie Marsh	aa41ffcfde	Add `BindingKind` variants to represent deleted bindings (#5071 ) ## Summary Our current mechanism for handling deletions (e.g., `del x`) is to remove the symbol from the scope's `bindings` table. This "does the right thing", in that if we then reference a deleted symbol, we're able to determine that it's unbound -- but it causes a variety of problems, mostly in that it makes certain bindings and references unreachable after-the-fact. Consider: ```python x = 1 print(x) del x ``` If we analyze this code _after_ running the semantic model over the AST, we'll have no way of knowing that `x` was ever introduced in the scope, much less that it was bound to a value, read, and then deleted -- because we effectively erased `x` from the model entirely when we hit the deletion. In practice, this will make it impossible for us to support local symbol renames. It also means that certain rules that we want to move out of the model-building phase and into the "check dead scopes" phase wouldn't work today, since we'll have lost important information about the source code. This PR introduces two new `BindingKind` variants to model deletions: - `BindingKind::Deletion`, which represents `x = 1; del x`. - `BindingKind::UnboundException`, which represents: ```python try: 1 / 0 except Exception as e: pass ``` In the latter case, `e` gets unbound after the exception handler (assuming it's triggered), so we want to handle it similarly to a deletion. The main challenge here is auditing all of our existing `Binding` and `Scope` usages to understand whether they need to accommodate deletions or otherwise behave differently. If you look one commit back on this branch, you'll see that the code is littered with `NOTE(charlie)` comments that describe the reasoning behind changing (or not) each of those call sites. I've also augmented our test suite in preparation for this change over a few prior PRs. ### Alternatives As an alternative, I considered introducing a flag to `BindingFlags`, like `BindingFlags::UNBOUND`, and setting that at the appropriate time. This turned out to be a much more difficult change, because we tend to match on `BindingKind` all over the place (e.g., we have a bunch of code blocks that only run when a `BindingKind` is `BindingKind::Importation`). As a result, introducing these new `BindingKind` variants requires only a few changes at the client sites. Adding a flag would've required a much wider-reaching change.	2023-06-14 09:27:24 -04:00
Charlie Marsh	fc6580592d	Use Expr::is_* methods at more call sites (#5075 )	2023-06-14 04:02:39 +00:00
Charlie Marsh	7e37d8916c	Remove lexer dependency from identifier_range (#5036 ) ## Summary We run this quite a bit -- the new version is zero-allocation, though it's not quite as nice as the lexer we have in the formatter.	2023-06-12 22:06:03 +00:00
Charlie Marsh	ab11dd08df	Improve `TypedDict` conversion logic for shadowed builtins and dunder methods (#5038 ) ## Summary This PR (1) avoids flagging `TypedDict` and `NamedTuple` conversions when attributes are dunder methods, like `__dict__`, and (2) avoids flagging the `A003` shadowed-attribute rule for `TypedDict` classes at all, where it doesn't really apply (since those attributes are only accessed via subscripting anyway). Closes #5027.	2023-06-12 21:23:39 +00:00
Charlie Marsh	445e1723ab	Use `Stmt::parse` in lieu of `Suite` unwraps (#5002 )	2023-06-10 04:55:31 +00:00
Charlie Marsh	2d597bc1fb	Parenthesize expressions prior to lexing in F632 (#5001 )	2023-06-10 04:23:43 +00:00
Charlie Marsh	02b8ce82af	Refactor `RET504` to only enforce assignment-then-return pattern (#4997 ) ## Summary The `RET504` rule, which looks for unnecessary assignments before return statements, is a frequent source of issues (#4173, #4236, #4242, #1606, #2950). Over time, we've tried to refine the logic to handle more cases. For example, we now avoid analyzing any functions that contain any function calls or attribute assignments, since those operations can contain side effects (and so we mark them as a "read" on all variables in the function -- we could do a better job with code graph analysis to handle this limitation, but that'd be a more involved change.) We also avoid flagging any variables that are the target of multiple assignments. Ultimately, though, I'm not happy with the implementation -- we just can't do sufficiently reliable analysis of arbitrary code flow given the limited logic herein, and the existing logic is very hard to reason about and maintain. This PR refocuses the rule to only catch cases of the form: ```py def f(): x = 1 return x ``` That is, we now only flag returns that are immediately preceded by an assignment to the returned variable. While this is more limiting, in some ways, it lets us flag more cases vis-a-vis the previous implementation, since we no longer "fully eject" when functions contain function calls and other effect-ful operations. Closes #4173. Closes #4236. Closes #4242.	2023-06-10 00:05:01 -04:00

1 2

91 commits