Commit graph

104 commits

Author SHA1 Message Date
Charlie Marsh
dadad0e9ed
Remove some allocations in argument detection (#5481)
## Summary

Drive-by PR to remove some allocations around argument name matching.
2023-07-03 12:21:26 -04:00
Anders Kaseorg
df13e69c3c
Format let-else with rustfmt nightly (#5461)
Support for `let…else` formatting was just merged to nightly
(rust-lang/rust#113225). Rerun `cargo fmt` with Rust nightly 2023-07-02
to pick this up. Followup to #939.

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
2023-07-03 02:13:35 +00:00
Charlie Marsh
8e06140d1d
Remove continuations when deleting statements (#5198)
## Summary

This PR modifies our statement deletion logic to delete any preceding
continuation lines.

For example, given:

```py
x = 1; \
  import os
```

We'll now rewrite to:

```py
x = 1;
```

In addition, the logic can now handle multiple preceding continuations
(which is unlikely, but valid).
2023-06-19 22:04:28 -04:00
Charlie Marsh
36e01ad6eb
Upgrade RustPython (#5192)
## Summary

This PR upgrade RustPython to pull in the changes to `Arguments` (zip
defaults with their identifiers) and all the renames to `CmpOp` and
friends.
2023-06-19 21:09:53 +00:00
Charlie Marsh
fab2a4adf7
Use matches! for insecure hash rule (#5141) 2023-06-16 04:18:32 +00:00
Charlie Marsh
5ea3e42513
Always use identifier ranges to store bindings (#5110)
## Summary

At present, when we store a binding, we include a `TextRange` alongside
it. The `TextRange` _sometimes_ matches the exact range of the
identifier to which the `Binding` is linked, but... not always.

For example, given:

```python
x = 1
```

The binding we create _will_ use the range of `x`, because the left-hand
side is an `Expr::Name`, which has a valid range on it.

However, given:

```python
try:
  pass
except ValueError as e:
  pass
```

When we create a binding for `e`, we don't have a `TextRange`... The AST
doesn't give us one. So we end up extracting it via lexing.

This PR extends that pattern to the rest of the binding kinds, to ensure
that whenever we create a binding, we always use the range of the bound
name. This leads to better diagnostics in cases like pattern matching,
whereby the diagnostic for "unused variable `x`" here used to include
`*x`, instead of just `x`:

```python
def f(provided: int) -> int:
    match provided:
        case [_, *x]:
            pass
```

This is _also_ required for symbol renames, since we track writes as
bindings -- so we need to know the ranges of the bound symbols.

By storing these bindings precisely, we can also remove the
`binding.trimmed_range` abstraction -- since bindings already use the
"trimmed range".

To implement this behavior, I took some of our existing utilities (like
the code we had for `except ValueError as e` above), migrated them from
a full lexer to a zero-allocation lexer that _only_ identifies
"identifiers", and moved the behavior into a trait, so we can now do
`stmt.identifier(locator)` to get the range for the identifier.

Honestly, we might end up discarding much of this if we decide to put
ranges on all identifiers
(https://github.com/astral-sh/RustPython-Parser/pull/8). But even if we
do, this will _still_ be a good change, because the lexer introduced
here is useful beyond names (e.g., we use it find the `except` keyword
in an exception handler, to find the `else` after a `for` loop, and so
on). So, I'm fine committing this even if we end up changing our minds
about the right approach.

Closes #5090.

## Benchmarks

No significant change, with one statistically significant improvement
(-2.1654% on `linter/all-rules/large/dataset.py`):

```
linter/default-rules/numpy/globals.py
                        time:   [73.922 µs 73.955 µs 73.986 µs]
                        thrpt:  [39.882 MiB/s 39.898 MiB/s 39.916 MiB/s]
                 change:
                        time:   [-0.5579% -0.4732% -0.3980%] (p = 0.00 < 0.05)
                        thrpt:  [+0.3996% +0.4755% +0.5611%]
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild
linter/default-rules/pydantic/types.py
                        time:   [1.4909 ms 1.4917 ms 1.4926 ms]
                        thrpt:  [17.087 MiB/s 17.096 MiB/s 17.106 MiB/s]
                 change:
                        time:   [+0.2140% +0.2741% +0.3392%] (p = 0.00 < 0.05)
                        thrpt:  [-0.3380% -0.2734% -0.2136%]
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
linter/default-rules/numpy/ctypeslib.py
                        time:   [688.97 µs 691.34 µs 694.15 µs]
                        thrpt:  [23.988 MiB/s 24.085 MiB/s 24.168 MiB/s]
                 change:
                        time:   [-1.3282% -0.7298% -0.1466%] (p = 0.02 < 0.05)
                        thrpt:  [+0.1468% +0.7351% +1.3461%]
                        Change within noise threshold.
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  12 (12.00%) high severe
linter/default-rules/large/dataset.py
                        time:   [3.3872 ms 3.4032 ms 3.4191 ms]
                        thrpt:  [11.899 MiB/s 11.954 MiB/s 12.011 MiB/s]
                 change:
                        time:   [-0.6427% -0.2635% +0.0906%] (p = 0.17 > 0.05)
                        thrpt:  [-0.0905% +0.2642% +0.6469%]
                        No change in performance detected.
Found 20 outliers among 100 measurements (20.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  4 (4.00%) high mild
  13 (13.00%) high severe

linter/all-rules/numpy/globals.py
                        time:   [148.99 µs 149.21 µs 149.42 µs]
                        thrpt:  [19.748 MiB/s 19.776 MiB/s 19.805 MiB/s]
                 change:
                        time:   [-0.7340% -0.5068% -0.2778%] (p = 0.00 < 0.05)
                        thrpt:  [+0.2785% +0.5094% +0.7395%]
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high severe
linter/all-rules/pydantic/types.py
                        time:   [3.0362 ms 3.0396 ms 3.0441 ms]
                        thrpt:  [8.3779 MiB/s 8.3903 MiB/s 8.3997 MiB/s]
                 change:
                        time:   [-0.0957% +0.0618% +0.2125%] (p = 0.45 > 0.05)
                        thrpt:  [-0.2121% -0.0618% +0.0958%]
                        No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe
linter/all-rules/numpy/ctypeslib.py
                        time:   [1.6879 ms 1.6894 ms 1.6909 ms]
                        thrpt:  [9.8478 MiB/s 9.8562 MiB/s 9.8652 MiB/s]
                 change:
                        time:   [-0.2279% -0.0888% +0.0436%] (p = 0.18 > 0.05)
                        thrpt:  [-0.0435% +0.0889% +0.2284%]
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) low mild
  1 (1.00%) high severe
linter/all-rules/large/dataset.py
                        time:   [7.1520 ms 7.1586 ms 7.1654 ms]
                        thrpt:  [5.6777 MiB/s 5.6831 MiB/s 5.6883 MiB/s]
                 change:
                        time:   [-2.5626% -2.1654% -1.7780%] (p = 0.00 < 0.05)
                        thrpt:  [+1.8102% +2.2133% +2.6300%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
```
2023-06-15 18:43:19 +00:00
Charlie Marsh
aa41ffcfde
Add BindingKind variants to represent deleted bindings (#5071)
## Summary

Our current mechanism for handling deletions (e.g., `del x`) is to
remove the symbol from the scope's `bindings` table. This "does the
right thing", in that if we then reference a deleted symbol, we're able
to determine that it's unbound -- but it causes a variety of problems,
mostly in that it makes certain bindings and references unreachable
after-the-fact.

Consider:

```python
x = 1
print(x)
del x
```

If we analyze this code _after_ running the semantic model over the AST,
we'll have no way of knowing that `x` was ever introduced in the scope,
much less that it was bound to a value, read, and then deleted --
because we effectively erased `x` from the model entirely when we hit
the deletion.

In practice, this will make it impossible for us to support local symbol
renames. It also means that certain rules that we want to move out of
the model-building phase and into the "check dead scopes" phase wouldn't
work today, since we'll have lost important information about the source
code.

This PR introduces two new `BindingKind` variants to model deletions:

- `BindingKind::Deletion`, which represents `x = 1; del x`.
- `BindingKind::UnboundException`, which represents:

```python
try:
  1 / 0
except Exception as e:
  pass
```

In the latter case, `e` gets unbound after the exception handler
(assuming it's triggered), so we want to handle it similarly to a
deletion.

The main challenge here is auditing all of our existing `Binding` and
`Scope` usages to understand whether they need to accommodate deletions
or otherwise behave differently. If you look one commit back on this
branch, you'll see that the code is littered with `NOTE(charlie)`
comments that describe the reasoning behind changing (or not) each of
those call sites. I've also augmented our test suite in preparation for
this change over a few prior PRs.

### Alternatives

As an alternative, I considered introducing a flag to `BindingFlags`,
like `BindingFlags::UNBOUND`, and setting that at the appropriate time.

This turned out to be a much more difficult change, because we tend to
match on `BindingKind` all over the place (e.g., we have a bunch of code
blocks that only run when a `BindingKind` is
`BindingKind::Importation`). As a result, introducing these new
`BindingKind` variants requires only a few changes at the client sites.
Adding a flag would've required a much wider-reaching change.
2023-06-14 09:27:24 -04:00
Charlie Marsh
fc6580592d
Use Expr::is_* methods at more call sites (#5075) 2023-06-14 04:02:39 +00:00
Charlie Marsh
7e37d8916c
Remove lexer dependency from identifier_range (#5036)
## Summary

We run this quite a bit -- the new version is zero-allocation, though
it's not quite as nice as the lexer we have in the formatter.
2023-06-12 22:06:03 +00:00
Charlie Marsh
ab11dd08df
Improve TypedDict conversion logic for shadowed builtins and dunder methods (#5038)
## Summary

This PR (1) avoids flagging `TypedDict` and `NamedTuple` conversions
when attributes are dunder methods, like `__dict__`, and (2) avoids
flagging the `A003` shadowed-attribute rule for `TypedDict` classes at
all, where it doesn't really apply (since those attributes are only
accessed via subscripting anyway).

Closes #5027.
2023-06-12 21:23:39 +00:00
Charlie Marsh
445e1723ab
Use Stmt::parse in lieu of Suite unwraps (#5002) 2023-06-10 04:55:31 +00:00
Charlie Marsh
2d597bc1fb
Parenthesize expressions prior to lexing in F632 (#5001) 2023-06-10 04:23:43 +00:00
Charlie Marsh
02b8ce82af
Refactor RET504 to only enforce assignment-then-return pattern (#4997)
## Summary

The `RET504` rule, which looks for unnecessary assignments before return
statements, is a frequent source of issues (#4173, #4236, #4242, #1606,
#2950). Over time, we've tried to refine the logic to handle more cases.
For example, we now avoid analyzing any functions that contain any
function calls or attribute assignments, since those operations can
contain side effects (and so we mark them as a "read" on all variables
in the function -- we could do a better job with code graph analysis to
handle this limitation, but that'd be a more involved change.) We also
avoid flagging any variables that are the target of multiple
assignments. Ultimately, though, I'm not happy with the implementation
-- we just can't do sufficiently reliable analysis of arbitrary code
flow given the limited logic herein, and the existing logic is very hard
to reason about and maintain.

This PR refocuses the rule to only catch cases of the form:

```py
def f():
    x = 1
    return x
```

That is, we now only flag returns that are immediately preceded by an
assignment to the returned variable. While this is more limiting, in
some ways, it lets us flag more cases vis-a-vis the previous
implementation, since we no longer "fully eject" when functions contain
function calls and other effect-ful operations.

Closes #4173.

Closes #4236.

Closes #4242.
2023-06-10 00:05:01 -04:00
Charlie Marsh
f401050878
Introduce PythonWhitespace to confine trim operations to Python whitespace (#4994)
## Summary

We use `.trim()` and friends in a bunch of places, to strip whitespace
from source code. However, not all Unicode whitespace characters are
considered "whitespace" in Python, which only supports the standard
space, tab, and form-feed characters.

This PR audits our usages of `.trim()`, `.trim_start()`, `.trim_end()`,
and `char::is_whitespace`, and replaces them as appropriate with a new
`.trim_whitespace()` analogues, powered by a `PythonWhitespace` trait.

In general, the only place that should continue to use `.trim()` is
content within docstrings, which don't need to adhere to Python's
semantic definitions of whitespace.

Closes #4991.
2023-06-09 21:44:50 -04:00
Charlie Marsh
1d756dc3a7
Move Python whitespace utilities into new ruff_python_whitespace crate (#4993)
## Summary

`ruff_newlines` becomes `ruff_python_whitespace`, and includes the
existing "universal newline" handlers alongside the Python
whitespace-specific utilities.
2023-06-10 00:59:57 +00:00
Charlie Marsh
16d1e63a5e
Respect 'is not' operators split across newlines (#4977) 2023-06-09 05:07:45 +00:00
Micha Reiser
39a1f3980f
Upgrade RustPython (#4900) 2023-06-08 05:53:14 +00:00
Charlie Marsh
8938b2d555
Use qualified_name terminology in more structs for consistency (#4873) 2023-06-05 19:06:48 +00:00
Ryan Yang
72245960a1
implement E307 for pylint invalid str return type (#4854) 2023-06-05 17:54:15 +00:00
qdegraaf
fcbf5c3fae
Add PYI034 for flake8-pyi plugin (#4764) 2023-06-02 02:15:57 +00:00
Charlie Marsh
ab26f2dc9d
Use saturating_sub in more token-walking methods (#4773) 2023-06-01 17:16:32 -04:00
Charlie Marsh
9d0ffd33ca
Move universal newline handling into its own crate (#4729) 2023-05-31 12:00:47 -04:00
Micha Reiser
6c1ff6a85f
Upgrade RustPython (#4747) 2023-05-31 08:26:35 +00:00
Charlie Marsh
9741f788c7
Remove globals table from Scope (#4686) 2023-05-27 22:35:20 -04:00
Charlie Marsh
0f610f2cf7
Remove dedicated ScopeKind structs in favor of AST nodes (#4648) 2023-05-25 19:31:02 +00:00
Micha Reiser
daadd24bde
Include decorators in Function and Class definition ranges (#4467) 2023-05-22 17:50:42 +02:00
Ville Skyttä
2e2ba2cb16
Avoid some false positives in dunder variable assigments (#4508) 2023-05-19 02:11:20 +00:00
Charlie Marsh
e9c6f16c56
Move unparse utility methods onto Generator (#4497) 2023-05-18 15:00:46 +00:00
Charlie Marsh
73efbeb581
Invert quote-style when generating code within f-strings (#4487) 2023-05-18 14:33:33 +00:00
Micha Reiser
fa26860296
Refactor range from Attributed to Nodes (#4422) 2023-05-16 06:36:32 +00:00
Jeong, YunWon
be6e00ef6e
Re-integrate RustPython parser repository (#4359)
Co-authored-by: Micha Reiser <micha@reiser.io>
2023-05-11 07:47:17 +00:00
Charlie Marsh
fd34797d0f
Add a specialized StatementVisitor (#4349) 2023-05-10 12:42:20 -04:00
Micha Reiser
cab65b25da
Replace row/column based Location with byte-offsets. (#3931) 2023-04-26 18:11:02 +00:00
Jonathan Plasse
df77595426
Move Truthiness into ruff_python_ast (#4071) 2023-04-24 04:54:31 +00:00
Charlie Marsh
7fa1da20fb
Support relative imports in banned-api enforcement (#4025) 2023-04-19 14:30:13 -04:00
Micha Reiser
381203c084
Store source code on message (#3897) 2023-04-11 07:57:36 +00:00
Charlie Marsh
d919adc13c
Introduce a ruff_python_semantic crate (#3865) 2023-04-04 16:50:47 +00:00
Charlie Marsh
5625410936
Remove uses_magic_variable_access dependence on Checker (#3864) 2023-04-03 12:22:06 -04:00
Charlie Marsh
3744e9ab3f
Remove contains_effect's dependency on Context (#3855) 2023-04-03 12:08:13 -04:00
Charlie Marsh
d822e08111
Move CallPath into its own module (#3847) 2023-04-01 11:25:04 -04:00
Charlie Marsh
2f90157ce2
Move logging resolver into logging.rs (#3843) 2023-04-01 03:50:44 +00:00
Charlie Marsh
6d80c79bac
Combine operations.rs and helpers.rs (#3841) 2023-04-01 03:40:34 +00:00
Charlie Marsh
27e40e9b31
Remove helpers.rs dependency on Binding (#3839) 2023-04-01 03:19:45 +00:00
Charlie Marsh
b6276e2d95
Move f-string identification into rule module (#3838) 2023-03-31 23:10:11 -04:00
Charlie Marsh
cf7e1ddd08
Remove some usize references (#3819) 2023-03-30 17:35:42 -04:00
Charlie Marsh
22d5b0071d
Rename end_of_statement to end_of_last_statement (#3775) 2023-03-28 12:31:06 -04:00
Dhruv Manilawala
63adf9f5e8
Allow aliased logging module as a logger candidate (#3718) 2023-03-24 17:19:09 -04:00
Charlie Marsh
e8d17d23cb
Expand the scope of useless-expression (B018) (#3455) 2023-03-23 18:33:58 -04:00
Charlie Marsh
1b3e54231c
Flag, but don't fix, unused imports in ModuleNotFoundError blocks (#3658) 2023-03-22 13:03:30 -04:00
Micha Reiser
dedf4cbdeb
refactor: Move scope and binding types to scope.rs (#3573) 2023-03-17 17:31:33 +01:00