## Summary
This is whitespace as per `is_python_whitespace`, and right now it tends
to lead to panics in the formatter. Seems reasonable to treat it as
whitespace in the `SimpleTokenizer` too.
Closes .https://github.com/astral-sh/ruff/issues/7624.
## Summary
Given:
```python
if True:
if True:
pass
else:
pass
# a
# b
# c
else:
pass
```
We want to preserve the newline after the `# c` (before the `else`).
However, the `last_node` ends at the `pass`, and the comments are
trailing comments on the `pass`, not trailing comments on the
`last_node` (the `if`). As such, when counting the trailing newlines on
the outer `if`, we abort as soon as we see the comment (`# a`).
This PR changes the logic to skip _all_ comments (even those with
newlines between them). This is safe as we know that there are no
"leading" comments on the `else`, so there's no risk of skipping those
accidentally.
Closes https://github.com/astral-sh/ruff/issues/7602.
## Test Plan
No change in compatibility.
Before:
| project | similarity index | total files | changed files |
|--------------|------------------:|------------------:|------------------:|
| cpython | 0.76083 | 1789 | 1631 |
| django | 0.99983 | 2760 | 36 |
| transformers | 0.99963 | 2587 | 319 |
| twine | 1.00000 | 33 | 0 |
| typeshed | 0.99979 | 3496 | 22 |
| warehouse | 0.99967 | 648 | 15 |
| zulip | 0.99972 | 1437 | 21 |
After:
| project | similarity index | total files | changed files |
|--------------|------------------:|------------------:|------------------:|
| cpython | 0.76083 | 1789 | 1631 |
| django | 0.99983 | 2760 | 36 |
| transformers | 0.99963 | 2587 | 319 |
| twine | 1.00000 | 33 | 0 |
| typeshed | 0.99983 | 3496 | 18 |
| warehouse | 0.99967 | 648 | 15 |
| zulip | 0.99972 | 1437 | 21 |
**Summary** Instead of emitting a bogus token per char, we now only emit
on single last bogus token. This leads to much more concise output.
**Test Plan** Updated fixtures
## Summary
The tokenizer was split into a forward and a backwards tokenizer. The
backwards tokenizer uses the same names as the forwards ones (e.g.
`next_token`). The backwards tokenizer gets the comment ranges that we
already built to skip comments.
---------
Co-authored-by: Micha Reiser <micha@reiser.io>
## Summary
The motivation here is that this enables us to implement `Ranged` in
crates that don't depend on `ruff_python_ast`.
Largely a mechanical refactor with a lot of regex, Clippy help, and
manual fixups.
## Test Plan
`cargo test`
## Summary
This PR modifies our formatting of comments around the `.` in an
attribute. Specifically, the goal here is to avoid _reordering_
comments, and the net effect is that we generally leave comments
where-they-are when dealing with comments between around the dot (which
you can also think of as comments between attributes).
All comments around the dot are now treated as dangling and formatted
manually, with the exception of end-of-line or parenthesized comments on
the value, like those marked as trailing here, which remain trailing:
```python
(
(
a # trailing end-of-line
# trailing own-line
) # dangling before dot end-of-line
.b # trailing end-of-line
)
```
Closes https://github.com/astral-sh/ruff/issues/6823.
## Test Plan
`cargo test`
Before:
| project | similarity index |
|--------------|------------------|
| cpython | 0.76050 |
| django | 0.99820 |
| transformers | 0.99800 |
| twine | 0.99876 |
| typeshed | 0.99953 |
| warehouse | 0.99615 |
| zulip | 0.99729 |
After:
| project | similarity index |
|--------------|------------------|
| cpython | 0.76050 |
| django | 0.99820 |
| transformers | 0.99800 |
| twine | 0.99876 |
| typeshed | 0.99953 |
| warehouse | 0.99615 |
| zulip | 0.99729 |
## Summary
Allows for proper lexing of tokens like `->`.
The main challenge is to ensure that our forward and backwards
representations are the same for cases like `===`. Specifically, we want
that to lex as `==` followed by `=` regardless of whether it's a
forwards or backwards lex. To do so, we identify the range of the
sequential characters (the full span of `===`), lex it forwards, then
return the last token.
## Test Plan
`cargo test`
## Summary
This PR adds support for parenthesized comments. A parenthesized comment
is a comment that appears within a parenthesis, but not within the range
of the expression enclosed by the parenthesis. For example, the comment
here is a parenthesized comment:
```python
if (
# comment
True
):
...
```
The parentheses enclose the `True`, but the range of `True` doesn’t
include the `# comment`.
There are at least two problems associated with parenthesized comments:
(1) associating the comment with the correct (i.e., enclosed) node; and
(2) formatting the comment correctly, once it has been associated with
the enclosed node.
The solution proposed here for (1) is to search for parentheses between
preceding and following node, and use open and close parentheses to
break ties, rather than always assigning to the preceding node.
For (2), we handle these special parenthesized comments in `FormatExpr`.
The biggest risk with this approach is that we forget some codepath that
force-disables parenthesization (by passing in `Parentheses::Never`).
I've audited all usages of that enum and added additional handling +
test coverage for such cases.
Closes https://github.com/astral-sh/ruff/issues/6390.
## Test Plan
`cargo test` with new cases.
Before:
| project | similarity index |
|--------------|------------------|
| build | 0.75623 |
| cpython | 0.75472 |
| django | 0.99804 |
| transformers | 0.99618 |
| typeshed | 0.74233 |
| warehouse | 0.99601 |
| zulip | 0.99727 |
After:
| project | similarity index |
|--------------|------------------|
| build | 0.75623 |
| cpython | 0.75472 |
| django | 0.99804 |
| transformers | 0.99618 |
| typeshed | 0.74237 |
| warehouse | 0.99601 |
| zulip | 0.99727 |
## Summary
For #6485, I need to be able to use the `SimpleTokenizer` to lex the
space between any two adjacent expressions (i.e., the space between a
preceding and following node). This requires that we support a wider
range of keywords (like `and`, to connect the pieces of `x and y`), and
some additional single-character tokens (like `-` and `>`, to support
`->`). Note that the `SimpleTokenizer` does not support multi-character
tokens, so the `->` in a function signature is lexed as a `-` followed
by a `>` -- but this is fine for our purposes.
## Summary
This PR protects against code like:
```python
from typing import Optional
import bar # ruff: noqa
import baz
class Foo:
x: Optional[str] = None
```
In which the user wrote `# ruff: noqa` to ignore a specific error, not
realizing that it was a file-level exemption that thus turned off all
lint rules.
Specifically, if a `# ruff: noqa` directive is not at the start of a
line, we now ignore it and warn, since this is almost certainly a
mistake.
## Summary
This crate now contains utilities for dealing with trivia more broadly:
whitespace, newlines, "simple" trivia lexing, etc. So renaming it to
reflect its increased responsibilities.
To avoid conflicts, I've also renamed `Token` and `TokenKind` to
`SimpleToken` and `SimpleTokenKind`.