## Summary
This PR removes the cyclic dev dependency some of the crates had with
the parser crate.
The cyclic dependencies are:
* `ruff_python_ast` has a **dev dependency** on `ruff_python_parser` and
`ruff_python_parser` directly depends on `ruff_python_ast`
* `ruff_python_trivia` has a **dev dependency** on `ruff_python_parser`
and `ruff_python_parser` has an indirect dependency on
`ruff_python_trivia` (`ruff_python_parser` - `ruff_python_ast` -
`ruff_python_trivia`)
Specifically, this PR does the following:
* Introduce two new crates
* `ruff_python_ast_integration_tests` and move the tests from the
`ruff_python_ast` crate which uses the parser in this crate
* `ruff_python_trivia_integration_tests` and move the tests from the
`ruff_python_trivia` crate which uses the parser in this crate
### Motivation
The main motivation for this PR is to help development. Before this PR,
`rust-analyzer` wouldn't provide any intellisense in the
`ruff_python_parser` crate regarding the symbols in `ruff_python_ast`
crate.
```
[ERROR][2024-05-03 13:47:06] .../vim/lsp/rpc.lua:770 "rpc" "/Users/dhruv/.cargo/bin/rust-analyzer" "stderr" "[ERROR project_model::workspace] cyclic deps: ruff_python_parser(Idx::<CrateData>(50)) -> ruff_python_ast(Idx::<CrateData>(37)), alternative path: ruff_python_ast(Idx::<CrateData>(37)) -> ruff_python_parser(Idx::<CrateData>(50))\n"
```
## Test Plan
Check the logs of `rust-analyzer` to not see any signs of cyclic
dependency.
## Summary
When we fall through to parsing, the comment-detection rule is a
significant portion of lint time. This PR adds an additional fast
heuristic whereby we abort if a comment contains two consecutive name
tokens (via the zero-allocation lexer). For the `ctypeslib.py`, which
has a few cases that are now caught by this, it's a 2.5x speedup for the
rule (and a 20% speedup for token-based rules).
Update to [Rust
1.74](https://blog.rust-lang.org/2023/11/16/Rust-1.74.0.html) and use
the new clippy lints table.
The update itself introduced a new clippy lint about superfluous hashes
in raw strings, which got removed.
I moved our lint config from `rustflags` to the newly stabilized
[workspace.lints](https://doc.rust-lang.org/stable/cargo/reference/workspaces.html#the-lints-table).
One consequence is that we have to `unsafe_code = "warn"` instead of
"forbid" because the latter now actually bans unsafe code:
```
error[E0453]: allow(unsafe_code) incompatible with previous forbid
--> crates/ruff_source_file/src/newlines.rs:62:17
|
62 | #[allow(unsafe_code)]
| ^^^^^^^^^^^ overruled by previous forbid
|
= note: `forbid` lint level was set on command line
```
---------
Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
## Summary
This is whitespace as per `is_python_whitespace`, and right now it tends
to lead to panics in the formatter. Seems reasonable to treat it as
whitespace in the `SimpleTokenizer` too.
Closes .https://github.com/astral-sh/ruff/issues/7624.
## Summary
Given:
```python
if True:
if True:
pass
else:
pass
# a
# b
# c
else:
pass
```
We want to preserve the newline after the `# c` (before the `else`).
However, the `last_node` ends at the `pass`, and the comments are
trailing comments on the `pass`, not trailing comments on the
`last_node` (the `if`). As such, when counting the trailing newlines on
the outer `if`, we abort as soon as we see the comment (`# a`).
This PR changes the logic to skip _all_ comments (even those with
newlines between them). This is safe as we know that there are no
"leading" comments on the `else`, so there's no risk of skipping those
accidentally.
Closes https://github.com/astral-sh/ruff/issues/7602.
## Test Plan
No change in compatibility.
Before:
| project | similarity index | total files | changed files |
|--------------|------------------:|------------------:|------------------:|
| cpython | 0.76083 | 1789 | 1631 |
| django | 0.99983 | 2760 | 36 |
| transformers | 0.99963 | 2587 | 319 |
| twine | 1.00000 | 33 | 0 |
| typeshed | 0.99979 | 3496 | 22 |
| warehouse | 0.99967 | 648 | 15 |
| zulip | 0.99972 | 1437 | 21 |
After:
| project | similarity index | total files | changed files |
|--------------|------------------:|------------------:|------------------:|
| cpython | 0.76083 | 1789 | 1631 |
| django | 0.99983 | 2760 | 36 |
| transformers | 0.99963 | 2587 | 319 |
| twine | 1.00000 | 33 | 0 |
| typeshed | 0.99983 | 3496 | 18 |
| warehouse | 0.99967 | 648 | 15 |
| zulip | 0.99972 | 1437 | 21 |
**Summary** Instead of emitting a bogus token per char, we now only emit
on single last bogus token. This leads to much more concise output.
**Test Plan** Updated fixtures
## Summary
The tokenizer was split into a forward and a backwards tokenizer. The
backwards tokenizer uses the same names as the forwards ones (e.g.
`next_token`). The backwards tokenizer gets the comment ranges that we
already built to skip comments.
---------
Co-authored-by: Micha Reiser <micha@reiser.io>
## Summary
This PR modifies our formatting of comments around the `.` in an
attribute. Specifically, the goal here is to avoid _reordering_
comments, and the net effect is that we generally leave comments
where-they-are when dealing with comments between around the dot (which
you can also think of as comments between attributes).
All comments around the dot are now treated as dangling and formatted
manually, with the exception of end-of-line or parenthesized comments on
the value, like those marked as trailing here, which remain trailing:
```python
(
(
a # trailing end-of-line
# trailing own-line
) # dangling before dot end-of-line
.b # trailing end-of-line
)
```
Closes https://github.com/astral-sh/ruff/issues/6823.
## Test Plan
`cargo test`
Before:
| project | similarity index |
|--------------|------------------|
| cpython | 0.76050 |
| django | 0.99820 |
| transformers | 0.99800 |
| twine | 0.99876 |
| typeshed | 0.99953 |
| warehouse | 0.99615 |
| zulip | 0.99729 |
After:
| project | similarity index |
|--------------|------------------|
| cpython | 0.76050 |
| django | 0.99820 |
| transformers | 0.99800 |
| twine | 0.99876 |
| typeshed | 0.99953 |
| warehouse | 0.99615 |
| zulip | 0.99729 |
## Summary
Allows for proper lexing of tokens like `->`.
The main challenge is to ensure that our forward and backwards
representations are the same for cases like `===`. Specifically, we want
that to lex as `==` followed by `=` regardless of whether it's a
forwards or backwards lex. To do so, we identify the range of the
sequential characters (the full span of `===`), lex it forwards, then
return the last token.
## Test Plan
`cargo test`
## Summary
This PR adds support for parenthesized comments. A parenthesized comment
is a comment that appears within a parenthesis, but not within the range
of the expression enclosed by the parenthesis. For example, the comment
here is a parenthesized comment:
```python
if (
# comment
True
):
...
```
The parentheses enclose the `True`, but the range of `True` doesn’t
include the `# comment`.
There are at least two problems associated with parenthesized comments:
(1) associating the comment with the correct (i.e., enclosed) node; and
(2) formatting the comment correctly, once it has been associated with
the enclosed node.
The solution proposed here for (1) is to search for parentheses between
preceding and following node, and use open and close parentheses to
break ties, rather than always assigning to the preceding node.
For (2), we handle these special parenthesized comments in `FormatExpr`.
The biggest risk with this approach is that we forget some codepath that
force-disables parenthesization (by passing in `Parentheses::Never`).
I've audited all usages of that enum and added additional handling +
test coverage for such cases.
Closes https://github.com/astral-sh/ruff/issues/6390.
## Test Plan
`cargo test` with new cases.
Before:
| project | similarity index |
|--------------|------------------|
| build | 0.75623 |
| cpython | 0.75472 |
| django | 0.99804 |
| transformers | 0.99618 |
| typeshed | 0.74233 |
| warehouse | 0.99601 |
| zulip | 0.99727 |
After:
| project | similarity index |
|--------------|------------------|
| build | 0.75623 |
| cpython | 0.75472 |
| django | 0.99804 |
| transformers | 0.99618 |
| typeshed | 0.74237 |
| warehouse | 0.99601 |
| zulip | 0.99727 |
## Summary
For #6485, I need to be able to use the `SimpleTokenizer` to lex the
space between any two adjacent expressions (i.e., the space between a
preceding and following node). This requires that we support a wider
range of keywords (like `and`, to connect the pieces of `x and y`), and
some additional single-character tokens (like `-` and `>`, to support
`->`). Note that the `SimpleTokenizer` does not support multi-character
tokens, so the `->` in a function signature is lexed as a `-` followed
by a `>` -- but this is fine for our purposes.
## Summary
This crate now contains utilities for dealing with trivia more broadly:
whitespace, newlines, "simple" trivia lexing, etc. So renaming it to
reflect its increased responsibilities.
To avoid conflicts, I've also renamed `Token` and `TokenKind` to
`SimpleToken` and `SimpleTokenKind`.
2023-07-19 11:48:27 -04:00
Renamed from crates/ruff_python_whitespace/src/tokenizer.rs (Browse further)