## Summary
Closes https://github.com/astral-sh/ruff/issues/6384, although I think
the issue was fixed already on main, for the most part.
The linked issue is around formatting expressions like:
```python
def test():
(
yield
#comment 1
* # comment 2
# comment 3
test # comment 4
)
```
On main, prior to this PR, we now format like:
```python
def test():
(
yield (
# comment 1
# comment 2
# comment 3
*test
) # comment 4
)
```
Which strikes me as reasonable. (We can't test this, since it's a syntax
error after for our parser, despite being a syntax error in both cases
from CPython's perspective.)
Meanwhile, Black does:
```python
def test():
(
yield
# comment 1
* # comment 2
# comment 3
test # comment 4
)
```
So our formatting differs in that we move comments between the star and
the expression above the star.
As of this PR, we also support formatting this input, which is valid:
```python
def test():
(
yield
#comment 1
* # comment 2
# comment 3
test, # comment 4
1
)
```
Like:
```python
def test():
(
yield (
# comment 1
(
# comment 2
# comment 3
*test, # comment 4
1,
)
)
)
```
There were two fixes here: (1) marking starred comments as dangling and
formatting them properly; and (2) supporting parenthesized comments for
tuples that don't contain their own parentheses, as is often the case
for yielded tuples (previously, we hit a debug assert).
Note that this diff
## Test Plan
cargo test
## Summary
Allows for proper lexing of tokens like `->`.
The main challenge is to ensure that our forward and backwards
representations are the same for cases like `===`. Specifically, we want
that to lex as `==` followed by `=` regardless of whether it's a
forwards or backwards lex. To do so, we identify the range of the
sequential characters (the full span of `===`), lex it forwards, then
return the last token.
## Test Plan
`cargo test`
## Summary
This PR adds support for parenthesized comments. A parenthesized comment
is a comment that appears within a parenthesis, but not within the range
of the expression enclosed by the parenthesis. For example, the comment
here is a parenthesized comment:
```python
if (
# comment
True
):
...
```
The parentheses enclose the `True`, but the range of `True` doesn’t
include the `# comment`.
There are at least two problems associated with parenthesized comments:
(1) associating the comment with the correct (i.e., enclosed) node; and
(2) formatting the comment correctly, once it has been associated with
the enclosed node.
The solution proposed here for (1) is to search for parentheses between
preceding and following node, and use open and close parentheses to
break ties, rather than always assigning to the preceding node.
For (2), we handle these special parenthesized comments in `FormatExpr`.
The biggest risk with this approach is that we forget some codepath that
force-disables parenthesization (by passing in `Parentheses::Never`).
I've audited all usages of that enum and added additional handling +
test coverage for such cases.
Closes https://github.com/astral-sh/ruff/issues/6390.
## Test Plan
`cargo test` with new cases.
Before:
| project | similarity index |
|--------------|------------------|
| build | 0.75623 |
| cpython | 0.75472 |
| django | 0.99804 |
| transformers | 0.99618 |
| typeshed | 0.74233 |
| warehouse | 0.99601 |
| zulip | 0.99727 |
After:
| project | similarity index |
|--------------|------------------|
| build | 0.75623 |
| cpython | 0.75472 |
| django | 0.99804 |
| transformers | 0.99618 |
| typeshed | 0.74237 |
| warehouse | 0.99601 |
| zulip | 0.99727 |
## Summary
Unlike other statements, Black always adds a trailing comma if an
import-from statement breaks with a single import member. I believe this
is for compatibility with isort -- see
09f5ee3a19,
https://github.com/psf/black/issues/127, or
66648c528a/src/black/linegen.py (L1452)
for the current version.
## Test Plan
`cargo test`, notice that a big chunk of the compatibility suite is
removed.
Before:
| project | similarity index |
|--------------|------------------|
| cpython | 0.75472 |
| django | 0.99804 |
| transformers | 0.99618 |
| twine | 0.99876 |
| typeshed | 0.74233 |
| warehouse | 0.99601 |
| zulip | 0.99727 |
After:
| project | similarity index |
|--------------|------------------|
| cpython | 0.75472 |
| django | 0.99804 |
| transformers | 0.99618 |
| twine | 0.99876 |
| typeshed | 0.74260 |
| warehouse | 0.99601 |
| zulip | 0.99727 |
## Summary
Instead, we set an `is_star` flag on `Stmt::Try`. This is similar to the
pattern we've migrated towards for `Stmt::For` (removing
`Stmt::AsyncFor`) and friends. While these are significant differences
for an interpreter, we tend to handle these cases identically or nearly
identically.
## Test Plan
`cargo test`
## Summary
This PR adds handling for comments on open parentheses in parenthesized
context managers. For example, given:
```python
with ( # comment
CtxManager1() as example1,
CtxManager2() as example2,
CtxManager3() as example3,
):
...
```
We want to preserve that formatting. (Black does the same.) On `main`,
we format as:
```python
with (
# comment
CtxManager1() as example1,
CtxManager2() as example2,
CtxManager3() as example3,
):
...
```
It's very similar to how `StmtImportFrom` is handled.
Note that this case _isn't_ covered by the "parenthesized comment"
proposal, since this is a common on the statement that would typically
be attached to the first `WithItem`, and the `WithItem` _itself_ can
have parenthesized comments, like:
```python
with ( # comment
(
CtxManager1() # comment
) as example1,
CtxManager2() as example2,
CtxManager3() as example3,
):
...
```
## Test Plan
`cargo test`
Confirmed no change in similarity score.
## Summary
For #6485, I need to be able to use the `SimpleTokenizer` to lex the
space between any two adjacent expressions (i.e., the space between a
preceding and following node). This requires that we support a wider
range of keywords (like `and`, to connect the pieces of `x and y`), and
some additional single-character tokens (like `-` and `>`, to support
`->`). Note that the `SimpleTokenizer` does not support multi-character
tokens, so the `->` in a function signature is lexed as a `-` followed
by a `>` -- but this is fine for our purposes.
## Summary
In https://github.com/astral-sh/ruff/pull/6512, we added a flag to the
AST to mark implicitly-concatenated string expressions. This PR makes
use of that flag to remove the `is_implicit_concatenation` method.
## Test Plan
`cargo test`
## Summary
The bracketed-end-of-line comment rule is meant to assign comments like
this as "immediately following the bracket":
```python
f( # comment
1
)
```
However, the logic was such that we treated this equivalently:
```python
f(
( # comment
1
)
)
```
This PR modifies the placement logic to ensure that we only skip the
opening bracket, and not any nested brackets. The above is now formatted
as:
```python
f(
(
# comment
1
)
)
```
(But will be corrected once we handle parenthesized comments properly.)
## Test Plan
`cargo test`
Confirmed no change in similarity score.
## Summary
Per the discussion in
https://github.com/astral-sh/ruff/discussions/6183, this PR adds an
`implicit_concatenated` flag to the string and bytes constant variants.
It's not actually _used_ anywhere as of this PR, but it is covered by
the tests.
Specifically, we now use a struct for the string and bytes cases, along
with the `Expr::FString` node. That struct holds the value, plus the
flag:
```rust
#[derive(Clone, Debug, PartialEq, is_macro::Is)]
pub enum Constant {
Str(StringConstant),
Bytes(BytesConstant),
...
}
#[derive(Clone, Debug, PartialEq, Eq)]
pub struct StringConstant {
/// The string value as resolved by the parser (i.e., without quotes, or escape sequences, or
/// implicit concatenations).
pub value: String,
/// Whether the string contains multiple string tokens that were implicitly concatenated.
pub implicit_concatenated: bool,
}
impl Deref for StringConstant {
type Target = str;
fn deref(&self) -> &Self::Target {
self.value.as_str()
}
}
#[derive(Clone, Debug, PartialEq, Eq)]
pub struct BytesConstant {
/// The bytes value as resolved by the parser (i.e., without quotes, or escape sequences, or
/// implicit concatenations).
pub value: Vec<u8>,
/// Whether the string contains multiple string tokens that were implicitly concatenated.
pub implicit_concatenated: bool,
}
impl Deref for BytesConstant {
type Target = [u8];
fn deref(&self) -> &Self::Target {
self.value.as_slice()
}
}
```
## Test Plan
`cargo test`
**Summary** Implement docstring formatting
**Test Plan** Matches black's `docstring.py` fixture exactly, added some
new cases for what is hard to debug with black and with what black
doesn't cover.
similarity index:
main:
zulip: 0.99702
django: 0.99784
warehouse: 0.99585
build: 0.75623
transformers: 0.99469
cpython: 0.75989
typeshed: 0.74853
this branch:
zulip: 0.99702
django: 0.99784
warehouse: 0.99585
build: 0.75623
transformers: 0.99464
cpython: 0.75517
typeshed: 0.74853
The regression in transformers is actually an improvement in a file they
don't format with black (they run `black examples tests src utils
setup.py conftest.py`, the difference is in hubconf.py). cpython doesn't
use black.
Closes#6196
## Summary
This method is almost never what you actually want, because it doesn't
respect Python's scoping semantics. For example, if you call this within
a class method, it will return class attributes, whereas Python actually
_skips_ symbols in classes unless the load occurs within the class
itself. I also want to move away from these kinds of dynamic lookups and
more towards `resolve_name`, which performs a lookup based on the stored
`BindingId` at the time of symbol resolution, and will make it much
easier for us to separate model building from linting in the near
future.
## Test Plan
`cargo test`
## Summary
Fixes some TODOs introduced in
https://github.com/astral-sh/ruff/pull/6538. In short, given an
expression like `1 if x > 0 else "Hello, world!"`, we now return a union
type that says the expression can resolve to either an `int` or a `str`.
The system remains very limited, it only works for obvious primitive
types, and there's no attempt to do inference on any more complex
variables. (If any expression yields `Unknown` or `TypeError`, we
propagate that result throughout and abort on the client's end.)