## Summary
This is a simple idea to avoid unnecessary work in the linter,
especially for rules that run on all name and/or all attribute nodes.
Imagine a rule like the NumPy deprecation check. If the user never
imported `numpy`, we should be able to skip that rule entirely --
whereas today, we do a `resolve_call_path` check on _every_ name in the
file. It turns out that there's basically a finite set of modules that
we care about, so we now track imports on those modules as explicit
flags on the semantic model. In rules that can _only_ ever trigger if
those modules were imported, we add a dedicated and extremely cheap
check to the top of the rule.
We could consider generalizing this to all modules, but I would expect
that not to be much faster than `resolve_call_path`, which is just a
hash map lookup on `TextSize` anyway.
It would also be nice to make this declarative, such that rules could
declare the modules they care about, the analyzers could call the rules
as appropriate. But, I don't think such a design should block merging
this.
Implements SIM113 from #998
Added tests
Limitations
- No fix yet
- Only flag cases where index variable immediately precede `for` loop
@charliermarsh please review and let me know any improvements
---------
Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
## Summary
I always found it odd that we had to pass this in, since it's really
higher-level context for the error. The awkwardness is further evidenced
by the fact that we pass in fake values everywhere (even outside of
tests). The source path isn't actually used to display the error; it's
only accessed elsewhere to _re-display_ the error in certain cases. This
PR modifies to instead pass the path directly in those cases.
## Summary
Given:
```python
from somewhere import get_cfg
def lookup_cfg(cfg_description):
cfg = get_cfg(cfg_description)
if cfg is not None:
return cfg
raise AttributeError(f"No cfg found matching {cfg_description}")
```
We were analyzing the method from last-to-first statement. So we saw the
`raise`, then assumed the method _always_ raised. In reality, though, it
_might_ return. This PR improves the branch analysis to respect these
mixed cases.
Closes https://github.com/astral-sh/ruff/issues/9269.
Closes https://github.com/astral-sh/ruff/issues/9304.
## Summary
Adds a rule to detect unions that include `typing.NoReturn` or
`typing.Never`. In such cases, the use of the bottom type is redundant.
Closes https://github.com/astral-sh/ruff/issues/9113.
## Test Plan
`cargo test`
## Summary
Adds `find_assigned_value` a function which gets the `&Expr` assigned to
a given `id` if one exists in the semantic model.
Open TODOs:
- [ ] Handle `binding.kind.is_unpacked_assignment()`: I am bit confused
by this one. The snippet from its documentation does not appear to be
counted as an unpacked assignment and the only ones I could find for
which that was true were invalid Python like:
```python
x, y = 1
```
- [ ] How to handle AugAssign. Can we combine statements like:
```python
(a, b) = [(1, 2, 3), (4,)]
a += (6, 7)
```
to get the full value for a? Code currently just returns `None` for
these assign types
- [ ] Multi target assigns
```python
m_c = (m_d, m_e) = (0, 0)
trio.sleep(m_c) # OK
trio.sleep(m_d) # TRIO115
trio.sleep(m_e) # TRIO115
```
## Test Plan
Used the function in two rules:
- `TRIO115`
- `PERF101`
Expanded both their fixtures for explicit multi target check
This PR allows `matplotlib.use` calls to intersperse imports without
triggering `E402`. This is a pragmatic choice as it's common to require
`matplotlib.use` calls prior to importing from within `matplotlib`
itself.
Closes https://github.com/astral-sh/ruff/issues/9091.
## Summary
It's common to interleave a `sys.path` modification between imports at
the top of a file. This is a frequent cause of `# noqa: E402` false
positives, as seen in the ecosystem checks. This PR modifies E402 to
omit such modifications when determining the "import boundary".
(We could consider linting against `sys.path` modifications, but that
should be a separate rule.)
Closes: https://github.com/astral-sh/ruff/issues/5557.
Rebase of #6365 authored by @davidszotten.
## Summary
This PR updates the AST structure for an f-string elements.
The main **motivation** behind this change is to have a dedicated node
for the string part of an f-string. Previously, the existing
`ExprStringLiteral` node was used for this purpose which isn't exactly
correct. The `ExprStringLiteral` node should include the quotes as well
in the range but the f-string literal element doesn't include the quote
as it's a specific part within an f-string. For example,
```python
f"foo {x}"
# ^^^^
# This is the literal part of an f-string
```
The introduction of `FStringElement` enum is helpful which represent
either the literal part or the expression part of an f-string.
### Rule Updates
This means that there'll be two nodes representing a string depending on
the context. One for a normal string literal while the other is a string
literal within an f-string. The AST checker is updated to accommodate
this change. The rules which work on string literal are updated to check
on the literal part of f-string as well.
#### Notes
1. The `Expr::is_literal_expr` method would check for
`ExprStringLiteral` and return true if so. But now that we don't
represent the literal part of an f-string using that node, this improves
the method's behavior and confines to the actual expression. We do have
the `FStringElement::is_literal` method.
2. We avoid checking if we're in a f-string context before adding to
`string_type_definitions` because the f-string literal is now a
dedicated node and not part of `Expr`.
3. Annotations cannot use f-string so we avoid changing any rules which
work on annotation and checks for `ExprStringLiteral`.
## Test Plan
- All references of `Expr::StringLiteral` were checked to see if any of
the rules require updating to account for the f-string literal element
node.
- New test cases are added for rules which check against the literal
part of an f-string.
- Check the ecosystem results and ensure it remains unchanged.
## Performance
There's a performance penalty in the parser. The reason for this remains
unknown as it seems that the generated assembly code is now different
for the `__reduce154` function. The reduce function body is just popping
the `ParenthesizedExpr` on top of the stack and pushing it with the new
location.
- The size of `FStringElement` enum is the same as `Expr` which is what
it replaces in `FString::format_spec`
- The size of `FStringExpressionElement` is the same as
`ExprFormattedValue` which is what it replaces
I tried reducing the `Expr` enum from 80 bytes to 72 bytes but it hardly
resulted in any performance gain. The difference can be seen here:
- Original profile: https://share.firefox.dev/3Taa7ES
- Profile after boxing some node fields:
https://share.firefox.dev/3GsNXpD
### Backtracking
I tried backtracking the changes to see if any of the isolated change
produced this regression. The problem here is that the overall change is
so small that there's only a single checkpoint where I can backtrack and
that checkpoint results in the same regression. This checkpoint is to
revert using `Expr` to the `FString::format_spec` field. After this
point, the change would revert back to the original implementation.
## Review process
The review process is similar to #7927. The first set of commits update
the node structure, parser, and related AST files. Then, further commits
update the linter and formatter part to account for the AST change.
---------
Co-authored-by: David Szotten <davidszotten@gmail.com>
## Summary
This PR adds (unsafe) fixes to the flake8-annotations rules that enforce
missing return types, offering to automatically insert type annotations
for functions with literal return values. The logic is smart enough to
generate simplified unions (e.g., `float` instead of `int | float`) and
deal with implicit returns (`return` without a value).
Closes https://github.com/astral-sh/ruff/issues/1640 (though we could
open a separate issue for referring parameter types).
Closes https://github.com/astral-sh/ruff/issues/8213.
## Test Plan
`cargo test`
## Summary
This brings ruff's behavior in line with what `pep8-naming` already does
and thus closes#8397.
I had initially implemented this to look at the last segment of a dotted
path only when the entry in the `*-decorators` setting started with a
`.`, but in the end I thought it's better to remain consistent w/
`pep8-naming` and doing a match against the last segment of the
decorator name in any case.
If you prefer to diverge from this in favor of less ambiguity in the
configuration let me know and I'll change it so you would need to put
e.g. `.expression` in the `classmethod-decorators` list.
## Test Plan
Tested against the file in the issue linked below, plus the new testcase
added in this PR.
~Improves detection of types imported from `typing_extensions`. Removes
the hard-coded list of supported types in `typing_extensions`; instead
assuming all types could be imported from `typing`, `_typeshed`, or
`typing_extensions`.~
~The typing extensions package appears to re-export types even if they
do not need modification.~
Adds detection of `if typing_extensions.TYPE_CHECKING` blocks. Avoids
inserting a new `if TYPE_CHECKING` block and `from typing import
TYPE_CHECKING` if `typing_extensions.TYPE_CHECKING` is used (closes
https://github.com/astral-sh/ruff/issues/8427)
---------
Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
## Summary
This PR splits the `Constant` enum as individual literal nodes. It
introduces the following new nodes for each variant:
* `ExprStringLiteral`
* `ExprBytesLiteral`
* `ExprNumberLiteral`
* `ExprBooleanLiteral`
* `ExprNoneLiteral`
* `ExprEllipsisLiteral`
The main motivation behind this refactor is to introduce the new AST
node for implicit string concatenation in the coming PR. The elements of
that node will be either a string literal, bytes literal or a f-string
which can be implemented using an enum. This means that a string or
bytes literal cannot be represented by `Constant::Str` /
`Constant::Bytes` which creates an inconsistency.
This PR avoids that inconsistency by splitting the constant nodes into
it's own literal nodes, literal being the more appropriate naming
convention from a static analysis tool perspective.
This also makes working with literals in the linter and formatter much
more ergonomic like, for example, if one would want to check if this is
a string literal, it can be done easily using
`Expr::is_string_literal_expr` or matching against `Expr::StringLiteral`
as oppose to matching against the `ExprConstant` and enum `Constant`. A
few AST helper methods can be simplified as well which will be done in a
follow-up PR.
This introduces a new `Expr::is_literal_expr` method which is the same
as `Expr::is_constant_expr`. There are also intermediary changes related
to implicit string concatenation which are quiet less. This is done so
as to avoid having a huge PR which this already is.
## Test Plan
1. Verify and update all of the existing snapshots (parser, visitor)
2. Verify that the ecosystem check output remains **unchanged** for both
the linter and formatter
### Formatter ecosystem check
#### `main`
| project | similarity index | total files | changed files |
|----------------|------------------:|------------------:|------------------:|
| cpython | 0.75803 | 1799 | 1647 |
| django | 0.99983 | 2772 | 34 |
| home-assistant | 0.99953 | 10596 | 186 |
| poetry | 0.99891 | 317 | 17 |
| transformers | 0.99966 | 2657 | 330 |
| twine | 1.00000 | 33 | 0 |
| typeshed | 0.99978 | 3669 | 20 |
| warehouse | 0.99977 | 654 | 13 |
| zulip | 0.99970 | 1459 | 22 |
#### `dhruv/constant-to-literal`
| project | similarity index | total files | changed files |
|----------------|------------------:|------------------:|------------------:|
| cpython | 0.75803 | 1799 | 1647 |
| django | 0.99983 | 2772 | 34 |
| home-assistant | 0.99953 | 10596 | 186 |
| poetry | 0.99891 | 317 | 17 |
| transformers | 0.99966 | 2657 | 330 |
| twine | 1.00000 | 33 | 0 |
| typeshed | 0.99978 | 3669 | 20 |
| warehouse | 0.99977 | 654 | 13 |
| zulip | 0.99970 | 1459 | 22 |
## Summary
Check that the sequence type is a list, set, dict, or tuple before
recommending replacing the `enumerate(...)` call with `range(len(...))`.
Document behaviour so users are aware of the type inference limitation
leading to false negatives.
Closes#7656.
## Summary
This is a follow-up to #7469 that attempts to achieve similar gains, but
without introducing malachite. Instead, this PR removes the `BigInt`
type altogether, instead opting for a simple enum that allows us to
store small integers directly and only allocate for values greater than
`i64`:
```rust
/// A Python integer literal. Represents both small (fits in an `i64`) and large integers.
#[derive(Clone, PartialEq, Eq, Hash)]
pub struct Int(Number);
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum Number {
/// A "small" number that can be represented as an `i64`.
Small(i64),
/// A "large" number that cannot be represented as an `i64`.
Big(Box<str>),
}
impl std::fmt::Display for Number {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
Number::Small(value) => write!(f, "{value}"),
Number::Big(value) => write!(f, "{value}"),
}
}
}
```
We typically don't care about numbers greater than `isize` -- our only
uses are comparisons against small constants (like `1`, `2`, `3`, etc.),
so there's no real loss of information, except in one or two rules where
we're now a little more conservative (with the worst-case being that we
don't flag, e.g., an `itertools.pairwise` that uses an extremely large
value for the slice start constant). For simplicity, a few diagnostics
now show a dedicated message when they see integers that are out of the
supported range (e.g., `outdated-version-block`).
An additional benefit here is that we get to remove a few dependencies,
especially `num-bigint`.
## Test Plan
`cargo test`
## Summary
We have a few rules that rely on detecting whether two statements are in
different branches -- for example, different arms of an `if`-`else`.
Historically, the way this was implemented is that, given two statement
IDs, we'd find the common parent (by traversing upwards via our
`Statements` abstraction); then identify branches "manually" by matching
the parents against `try`, `if`, and `match`, and returning iterators
over the arms; then check if there's an arm for which one of the
statements is a child, and the other is not.
This has a few drawbacks:
1. First, the code is generally a bit hard to follow (Konsti mentioned
this too when working on the `ElifElseClause` refactor).
2. Second, this is the only place in the codebase where we need to go
from `&Stmt` to `StatementID` -- _everywhere_ else, we only need to go
in the _other_ direction. Supporting these lookups means we need to
maintain a mapping from `&Stmt` to `StatementID` that includes every
`&Stmt` in the program. (We _also_ end up maintaining a `depth` level
for every statement.) I'd like to get rid of these requirements to
improve efficiency, reduce complexity, and enable us to treat AST modes
more generically in the future. (When I looked at adding the `&Expr` to
our existing statement-tracking infrastructure, maintaining a hash map
with all the statements noticeably hurt performance.)
The solution implemented here instead makes branches a first-class
concept in the semantic model. Like with `Statements`, we now have a
`Branches` abstraction, where each branch points to its optional parent.
When we store statements, we store the `BranchID` alongside each
statement. When we need to detect whether two statements are in the same
branch, we just realize each statement's branch path and compare the
two. (Assuming that the two statements are in the same scope, then
they're on the same branch IFF one branch path is a subset of the other,
starting from the top.) We then add some calls to the visitor to push
and pop branches in the appropriate places, for `if`, `try`, and `match`
statements.
Note that a branch is not 1:1 with a statement; instead, each branch is
closer to a suite, but not _every_ suite is a branch. For example, each
arm in an `if`-`elif`-`else` is a branch, but the `else` in a `for` loop
is not considered a branch.
In addition to being much simpler, this should also be more efficient,
since we've shed the entire `&Stmt` hash map, plus the `depth` that we
track on `StatementWithParent` in favor of a single `Option<BranchID>`
on `StatementWithParent` plus a single vector for all branches. The
lookups should be faster too, since instead of doing a bunch of jumps
around with the hash map + repeated recursive calls to find the common
parents, we instead just do a few simple lookups in the `Branches`
vector to realize and compare the branch paths.
## Test Plan
`cargo test` -- we have a lot of coverage for this, which we inherited
from PyFlakes
## Summary
Instead, we set an `is_star` flag on `Stmt::Try`. This is similar to the
pattern we've migrated towards for `Stmt::For` (removing
`Stmt::AsyncFor`) and friends. While these are significant differences
for an interpreter, we tend to handle these cases identically or nearly
identically.
## Test Plan
`cargo test`
## Summary
Fixes some TODOs introduced in
https://github.com/astral-sh/ruff/pull/6538. In short, given an
expression like `1 if x > 0 else "Hello, world!"`, we now return a union
type that says the expression can resolve to either an `int` or a `str`.
The system remains very limited, it only works for obvious primitive
types, and there's no attempt to do inference on any more complex
variables. (If any expression yields `Unknown` or `TypeError`, we
propagate that result throughout and abort on the client's end.)
## Summary
This PR leverages the unified function definition node to add precise
AST node types to `MemberKind`, which is used to power our docstring
definition tracking (e.g., classes and functions, whether they're
methods or functions or nested functions and so on, whether they have a
docstring, etc.). It was painful to do this in the past because the
function variants needed to support a union anyway, but storing precise
nodes removes like a dozen panics.
No behavior changes -- purely a refactor.
## Test Plan
`cargo test`
## Summary
Per the suggestion in
https://github.com/astral-sh/ruff/discussions/6183, this PR removes
`AsyncWith`, `AsyncFor`, and `AsyncFunctionDef`, replacing them with an
`is_async` field on the non-async variants of those structs. Unlike an
interpreter, we _generally_ have identical handling for these nodes, so
separating them into distinct variants adds complexity from which we
don't really benefit. This can be seen below, where we get to remove a
_ton_ of code related to adding generic `Any*` wrappers, and a ton of
duplicate branches for these cases.
## Test Plan
`cargo test` is unchanged, apart from parser snapshots.
## Summary
This PR fixes the performance degradation introduced in
https://github.com/astral-sh/ruff/pull/6345. Instead of using the
generic `Nodes` structs, we now use separate `Statement` and
`Expression` structs. Importantly, we can avoid tracking a bunch of
state for expressions that we need for parents: we don't need to track
reference-to-ID pointers (we just have no use-case for this -- I'd
actually like to remove this from statements too, but we need it for
branch detection right now), we don't need to track depth, etc.
In my testing, this entirely removes the regression on all-rules, and
gets us down to 2ms slower on the default rules (as a crude hyperfine
benchmark, so this is within margin of error IMO).
No behavioral changes.
## Summary
This PR removes a now-unnecessary abstraction from `helper.rs`
(`CallArguments`), in favor of adding methods to `Arguments` directly,
which helps with discoverability.
## Summary
This PR adds a new `Arguments` AST node, which we can use for function
calls and class definitions.
The `Arguments` node spans from the left (open) to right (close)
parentheses inclusive.
In the case of classes, the `Arguments` is an option, to differentiate
between:
```python
# None
class C: ...
# Some, with empty vectors
class C(): ...
```
In this PR, we don't really leverage this change (except that a few
rules get much simpler, since we don't need to lex to find the start and
end ranges of the parentheses, e.g.,
`crates/ruff/src/rules/pyupgrade/rules/lru_cache_without_parameters.rs`,
`crates/ruff/src/rules/pyupgrade/rules/unnecessary_class_parentheses.rs`).
In future PRs, this will be especially helpful for the formatter, since
we can track comments enclosed on the node itself.
## Test Plan
`cargo test`
## Summary
This PR adds a `logger-objects` setting that allows users to mark
specific symbols a `logging.Logger` objects. Currently, if a `logger` is
imported, we only flagged it as a `logging.Logger` if it comes exactly
from the `logging` module or is `flask.current_app.logger`.
This PR allows users to mark specific loggers, like
`logging_setup.logger`, to ensure that they're covered by the
`flake8-logging-format` rules and others.
For example, if you have a module `logging_setup.py` with the following
contents:
```python
import logging
logger = logging.getLogger(__name__)
```
Adding `"logging_setup.logger"` to `logger-objects` will ensure that
`logging_setup.logger` is treated as a `logging.Logger` object when
imported from other modules (e.g., `from logging_setup import logger`).
Closes https://github.com/astral-sh/ruff/issues/5694.