Commit graph

29 commits

Author SHA1 Message Date
Dhruv Manilawala
017e829115
Update string nodes for implicit concatenation (#7927)
## Summary

This PR updates the string nodes (`ExprStringLiteral`,
`ExprBytesLiteral`, and `ExprFString`) to account for implicit string
concatenation.

### Motivation

In Python, implicit string concatenation are joined while parsing
because the interpreter doesn't require the information for each part.
While that's feasible for an interpreter, it falls short for a static
analysis tool where having such information is more useful. Currently,
various parts of the code uses the lexer to get the individual string
parts.

One of the main challenge this solves is that of string formatting.
Currently, the formatter relies on the lexer to get the individual
string parts, and formats them including the comments accordingly. But,
with PEP 701, f-string can also contain comments. Without this change,
it becomes very difficult to add support for f-string formatting.

### Implementation

The initial proposal was made in this discussion:
https://github.com/astral-sh/ruff/discussions/6183#discussioncomment-6591993.
There were various AST designs which were explored for this task which
are available in the linked internal document[^1].

The selected variant was the one where the nodes were kept as it is
except that the `implicit_concatenated` field was removed and instead a
new struct was added to the `Expr*` struct. This would be a private
struct would contain the actual implementation of how the AST is
designed for both single and implicitly concatenated strings.

This implementation is achieved through an enum with two variants:
`Single` and `Concatenated` to avoid allocating a vector even for single
strings. There are various public methods available on the value struct
to query certain information regarding the node.

The nodes are structured in the following way:

```
ExprStringLiteral - "foo" "bar"
|- StringLiteral - "foo"
|- StringLiteral - "bar"

ExprBytesLiteral - b"foo" b"bar"
|- BytesLiteral - b"foo"
|- BytesLiteral - b"bar"

ExprFString - "foo" f"bar {x}"
|- FStringPart::Literal - "foo"
|- FStringPart::FString - f"bar {x}"
  |- StringLiteral - "bar "
  |- FormattedValue - "x"
```

[^1]: Internal document:
https://www.notion.so/astral-sh/Implicit-String-Concatenation-e036345dc48943f89e416c087bf6f6d9?pvs=4

#### Visitor

The way the nodes are structured is that the entire string, including
all the parts that are implicitly concatenation, is a single node
containing individual nodes for the parts. The previous section has a
representation of that tree for all the string nodes. This means that
new visitor methods are added to visit the individual parts of string,
bytes, and f-strings for `Visitor`, `PreorderVisitor`, and
`Transformer`.

## Test Plan

- `cargo insta test --workspace --all-features --unreferenced reject`
- Verify that the ecosystem results are unchanged
2023-11-24 17:55:41 -06:00
konsti
14e65afdc6
Update to Rust 1.74 and use new clippy lints table (#8722)
Update to [Rust
1.74](https://blog.rust-lang.org/2023/11/16/Rust-1.74.0.html) and use
the new clippy lints table.

The update itself introduced a new clippy lint about superfluous hashes
in raw strings, which got removed.

I moved our lint config from `rustflags` to the newly stabilized
[workspace.lints](https://doc.rust-lang.org/stable/cargo/reference/workspaces.html#the-lints-table).
One consequence is that we have to `unsafe_code = "warn"` instead of
"forbid" because the latter now actually bans unsafe code:

```
error[E0453]: allow(unsafe_code) incompatible with previous forbid
  --> crates/ruff_source_file/src/newlines.rs:62:17
   |
62 |         #[allow(unsafe_code)]
   |                 ^^^^^^^^^^^ overruled by previous forbid
   |
   = note: `forbid` lint level was set on command line
```

---------

Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
2023-11-16 18:12:46 -05:00
Dhruv Manilawala
230c9ce236
Split Constant to individual literal nodes (#8064)
## Summary

This PR splits the `Constant` enum as individual literal nodes. It
introduces the following new nodes for each variant:
* `ExprStringLiteral`
* `ExprBytesLiteral`
* `ExprNumberLiteral`
* `ExprBooleanLiteral`
* `ExprNoneLiteral`
* `ExprEllipsisLiteral`

The main motivation behind this refactor is to introduce the new AST
node for implicit string concatenation in the coming PR. The elements of
that node will be either a string literal, bytes literal or a f-string
which can be implemented using an enum. This means that a string or
bytes literal cannot be represented by `Constant::Str` /
`Constant::Bytes` which creates an inconsistency.

This PR avoids that inconsistency by splitting the constant nodes into
it's own literal nodes, literal being the more appropriate naming
convention from a static analysis tool perspective.

This also makes working with literals in the linter and formatter much
more ergonomic like, for example, if one would want to check if this is
a string literal, it can be done easily using
`Expr::is_string_literal_expr` or matching against `Expr::StringLiteral`
as oppose to matching against the `ExprConstant` and enum `Constant`. A
few AST helper methods can be simplified as well which will be done in a
follow-up PR.

This introduces a new `Expr::is_literal_expr` method which is the same
as `Expr::is_constant_expr`. There are also intermediary changes related
to implicit string concatenation which are quiet less. This is done so
as to avoid having a huge PR which this already is.

## Test Plan

1. Verify and update all of the existing snapshots (parser, visitor)
2. Verify that the ecosystem check output remains **unchanged** for both
the linter and formatter

### Formatter ecosystem check

#### `main`

| project | similarity index | total files | changed files |

|----------------|------------------:|------------------:|------------------:|
| cpython | 0.75803 | 1799 | 1647 |
| django | 0.99983 | 2772 | 34 |
| home-assistant | 0.99953 | 10596 | 186 |
| poetry | 0.99891 | 317 | 17 |
| transformers | 0.99966 | 2657 | 330 |
| twine | 1.00000 | 33 | 0 |
| typeshed | 0.99978 | 3669 | 20 |
| warehouse | 0.99977 | 654 | 13 |
| zulip | 0.99970 | 1459 | 22 |

#### `dhruv/constant-to-literal`

| project | similarity index | total files | changed files |

|----------------|------------------:|------------------:|------------------:|
| cpython | 0.75803 | 1799 | 1647 |
| django | 0.99983 | 2772 | 34 |
| home-assistant | 0.99953 | 10596 | 186 |
| poetry | 0.99891 | 317 | 17 |
| transformers | 0.99966 | 2657 | 330 |
| twine | 1.00000 | 33 | 0 |
| typeshed | 0.99978 | 3669 | 20 |
| warehouse | 0.99977 | 654 | 13 |
| zulip | 0.99970 | 1459 | 22 |
2023-10-30 12:13:23 +05:30
Dhruv Manilawala
78bbf6d403
New Singleton enum for PatternMatchSingleton node (#8063)
## Summary

This PR adds a new `Singleton` enum for the `PatternMatchSingleton`
node.

Earlier the node was using the `Constant` enum but the value for this
pattern can only be either `None`, `True` or `False`. With the coming PR
to remove the `Constant`, this node required a new type to fill in.

This also has the benefit of narrowing the type down to only the
possible values for the node as evident by the removal of `unreachable`.

## Test Plan

Update the AST snapshots and run `cargo test`.
2023-10-30 05:48:53 +00:00
Dhruv Manilawala
e62e245c61
Add support for PEP 701 (#7376)
## Summary

This PR adds support for PEP 701 in Ruff. This is a rollup PR of all the
other individual PRs. The separate PRs were created for logic separation
and code reviews. Refer to each pull request for a detail description on
the change.

Refer to the PR description for the list of pull requests within this PR.

## Test Plan

### Formatter ecosystem checks

Explanation for the change in ecosystem check:
https://github.com/astral-sh/ruff/pull/7597#issue-1908878183

#### `main`

```
| project      | similarity index  | total files       | changed files     |
|--------------|------------------:|------------------:|------------------:|
| cpython      |           0.76083 |              1789 |              1631 |
| django       |           0.99983 |              2760 |                36 |
| transformers |           0.99963 |              2587 |               319 |
| twine        |           1.00000 |                33 |                 0 |
| typeshed     |           0.99983 |              3496 |                18 |
| warehouse    |           0.99967 |               648 |                15 |
| zulip        |           0.99972 |              1437 |                21 |
```

#### `dhruv/pep-701`

```
| project      | similarity index  | total files       | changed files     |
|--------------|------------------:|------------------:|------------------:|
| cpython      |           0.76051 |              1789 |              1632 |
| django       |           0.99983 |              2760 |                36 |
| transformers |           0.99963 |              2587 |               319 |
| twine        |           1.00000 |                33 |                 0 |
| typeshed     |           0.99983 |              3496 |                18 |
| warehouse    |           0.99967 |               648 |                15 |
| zulip        |           0.99972 |              1437 |                21 |
```
2023-09-29 02:55:39 +00:00
Charlie Marsh
4d6f5ff0a7
Remove Int wrapper type from parser (#7577)
## Summary

This is only used for the `level` field in relative imports (e.g., `from
..foo import bar`). It seems unnecessary to use a wrapper here, so this
PR changes to a `u32` directly.
2023-09-21 17:01:44 +00:00
konsti
94b68f201b
Fix stylist indentation with a formfeed (#7489)
**Summary** In python, a formfeed is technically undefined behaviour
(https://docs.python.org/3/reference/lexical_analysis.html#indentation):
> A formfeed character may be present at the start of the line; it will
be ignored for
> the indentation calculations above. Formfeed characters occurring
elsewhere in the
> leading whitespace have an undefined effect (for instance, they may
reset the space
> count to zero).

In practice, they just reset the indentation:


df8b3a46a7/Parser/tokenizer.c (L1819-L1821)
a41bb2733f/crates/ruff_python_parser/src/lexer.rs (L664-L667)

The stylist didn't handle formfeeds previously and would produce invalid
indents. The remedy is to cut everything before a form feed.

Checks box for
https://github.com/astral-sh/ruff/issues/7455#issuecomment-1722458825

**Test Plan** Unit test for the stylist and a regression test for the
rule
2023-09-19 12:01:16 +02:00
konsti
56440ad835
Introduce ArgOrKeyword to keep call parameter order (#7302)
## Motivation

The `ast::Arguments` for call argument are split into positional
arguments (args) and keywords arguments (keywords). We currently assume
that call consists of first args and then keywords, which is generally
the case, but not always:

```python
f(*args, a=2, *args2, **kwargs)

class A(*args, a=2, *args2, **kwargs):
    pass
```

The consequence is accidentally reordering arguments
(https://github.com/astral-sh/ruff/pull/7268).

## Summary

`Arguments::args_and_keywords` returns an iterator of an `ArgOrKeyword`
enum that yields args and keywords in the correct order. I've fixed the
obvious `args` and `keywords` usages, but there might be some cases with
wrong assumptions remaining.

## Test Plan

The generator got new test cases, otherwise the stacked PR
(https://github.com/astral-sh/ruff/pull/7268) which uncovered this.
2023-09-13 08:45:46 +00:00
Dhruv Manilawala
04f2842e4f
Move ExprConstant::kind to StringConstant::unicode (#7180) 2023-09-06 07:39:25 +00:00
Charlie Marsh
f8e4e1d562
Fix named expression precedence in generator (#7170) 2023-09-05 17:06:57 +00:00
Dhruv Manilawala
1adde24133
Rename parser mode from Jupyter to Ipython (#7153) 2023-09-05 14:12:26 +00:00
Charlie Marsh
a56121672c
Add parentheses when simplifying conditions in SIM222 (#7117) 2023-09-03 22:35:59 +00:00
Charlie Marsh
32f4a96c64
Fix precedence of annotated assignments in generator (#7115) 2023-09-03 21:41:48 +00:00
Charlie Marsh
6a5acde226
Make Parameters an optional field on ExprLambda (#6669)
## Summary

If a lambda doesn't contain any parameters, or any parameter _tokens_
(like `*`), we can use `None` for the parameters. This feels like a
better representation to me, since, e.g., what should the `TextRange` be
for a non-existent set of parameters? It also allows us to remove
several sites where we check if the `Parameters` is empty by seeing if
it contains any arguments, so semantically, we're already trying to
detect and model around this elsewhere.

Changing this also fixes a number of issues with dangling comments in
parameter-less lambdas, since those comments are now automatically
marked as dangling on the lambda. (As-is, we were also doing something
not-great whereby the lambda was responsible for formatting dangling
comments on the parameters, which has been removed.)

Closes https://github.com/astral-sh/ruff/issues/6646.

Closes https://github.com/astral-sh/ruff/issues/6647.

## Test Plan

`cargo test`
2023-08-18 15:34:54 +00:00
Charlie Marsh
96d310fbab
Remove Stmt::TryStar (#6566)
## Summary

Instead, we set an `is_star` flag on `Stmt::Try`. This is similar to the
pattern we've migrated towards for `Stmt::For` (removing
`Stmt::AsyncFor`) and friends. While these are significant differences
for an interpreter, we tend to handle these cases identically or nearly
identically.

## Test Plan

`cargo test`
2023-08-14 13:39:44 -04:00
Charlie Marsh
f16e780e0a
Add an implicit concatenation flag to string and bytes constants (#6512)
## Summary

Per the discussion in
https://github.com/astral-sh/ruff/discussions/6183, this PR adds an
`implicit_concatenated` flag to the string and bytes constant variants.
It's not actually _used_ anywhere as of this PR, but it is covered by
the tests.

Specifically, we now use a struct for the string and bytes cases, along
with the `Expr::FString` node. That struct holds the value, plus the
flag:

```rust
#[derive(Clone, Debug, PartialEq, is_macro::Is)]
pub enum Constant {
    Str(StringConstant),
    Bytes(BytesConstant),
    ...
}

#[derive(Clone, Debug, PartialEq, Eq)]
pub struct StringConstant {
    /// The string value as resolved by the parser (i.e., without quotes, or escape sequences, or
    /// implicit concatenations).
    pub value: String,
    /// Whether the string contains multiple string tokens that were implicitly concatenated.
    pub implicit_concatenated: bool,
}

impl Deref for StringConstant {
    type Target = str;
    fn deref(&self) -> &Self::Target {
        self.value.as_str()
    }
}

#[derive(Clone, Debug, PartialEq, Eq)]
pub struct BytesConstant {
    /// The bytes value as resolved by the parser (i.e., without quotes, or escape sequences, or
    /// implicit concatenations).
    pub value: Vec<u8>,
    /// Whether the string contains multiple string tokens that were implicitly concatenated.
    pub implicit_concatenated: bool,
}

impl Deref for BytesConstant {
    type Target = [u8];
    fn deref(&self) -> &Self::Target {
        self.value.as_slice()
    }
}
```

## Test Plan

`cargo test`
2023-08-14 13:46:54 +00:00
Dhruv Manilawala
6a64f2289b
Rename Magic* to IpyEscape* (#6395)
## Summary

This PR renames the `MagicCommand` token to `IpyEscapeCommand` token and
`MagicKind` to `IpyEscapeKind` type to better reflect the purpose of the
token and type. Similarly, it renames the AST nodes from `LineMagic` to
`IpyEscapeCommand` prefixed with `Stmt`/`Expr` wherever necessary.

It also makes renames from using `jupyter_magic` to
`ipython_escape_commands` in various function names.

The mode value is still `Mode::Jupyter` because the escape commands are
part of the IPython syntax but the lexing/parsing is done for a Jupyter
notebook.

### Motivation behind the rename:
* IPython codebase defines it as "EscapeCommand" / "Escape Sequences":
* Escape Sequences:
292e3a2345/IPython/core/inputtransformer2.py (L329-L333)
* Escape command:
292e3a2345/IPython/core/inputtransformer2.py (L410-L411)
* The word "magic" is used mainly for the actual magic commands i.e.,
the ones starting with `%`/`%%`
(https://ipython.readthedocs.io/en/stable/interactive/reference.html#magic-command-system).
So, this avoids any confusion between the Magic token (`%`, `%%`) and
the escape command itself.
## Test Plan

* `cargo test` to make sure all renames are done correctly.
* `grep` for `jupyter_escape`/`magic` to make sure all renames are done
correctly.
2023-08-09 13:28:18 +00:00
Charlie Marsh
3f0eea6d87
Rename JoinedStr to FString in the AST (#6379)
## Summary

Per the proposal in https://github.com/astral-sh/ruff/discussions/6183,
this PR renames the `JoinedStr` node to `FString`.
2023-08-07 17:33:17 +00:00
Charlie Marsh
daefa74e9a
Remove async AST node variants for with, for, and def (#6369)
## Summary

Per the suggestion in
https://github.com/astral-sh/ruff/discussions/6183, this PR removes
`AsyncWith`, `AsyncFor`, and `AsyncFunctionDef`, replacing them with an
`is_async` field on the non-async variants of those structs. Unlike an
interpreter, we _generally_ have identical handling for these nodes, so
separating them into distinct variants adds complexity from which we
don't really benefit. This can be seen below, where we get to remove a
_ton_ of code related to adding generic `Any*` wrappers, and a ton of
duplicate branches for these cases.

## Test Plan

`cargo test` is unchanged, apart from parser snapshots.
2023-08-07 16:36:02 +00:00
Charlie Marsh
9f3567dea6
Use range: _ in lieu of range: _range (#6296)
## Summary

`range: _range` is slightly inconvenient because you can't use it
multiple times within a single match, unlike `_`.
2023-08-02 22:11:13 -04:00
Charlie Marsh
b095b7204b
Add a TypeParams node to the AST (#6261)
## Summary

Similar to #6259, this PR adds a `TypeParams` node to the AST, to
capture the list of type parameters with their surrounding brackets.

If a statement lacks type parameters, the `type_params` field will be
`None`.
2023-08-02 14:12:45 +00:00
Charlie Marsh
981e64f82b
Introduce an Arguments AST node for function calls and class definitions (#6259)
## Summary

This PR adds a new `Arguments` AST node, which we can use for function
calls and class definitions.

The `Arguments` node spans from the left (open) to right (close)
parentheses inclusive.

In the case of classes, the `Arguments` is an option, to differentiate
between:

```python
# None
class C: ...

# Some, with empty vectors
class C(): ...
```

In this PR, we don't really leverage this change (except that a few
rules get much simpler, since we don't need to lex to find the start and
end ranges of the parentheses, e.g.,
`crates/ruff/src/rules/pyupgrade/rules/lru_cache_without_parameters.rs`,
`crates/ruff/src/rules/pyupgrade/rules/unnecessary_class_parentheses.rs`).

In future PRs, this will be especially helpful for the formatter, since
we can track comments enclosed on the node itself.

## Test Plan

`cargo test`
2023-08-02 10:01:13 -04:00
Charlie Marsh
9c708d8fc1
Rename Parameter#arg and ParameterWithDefault#def fields (#6255)
## Summary

This PR renames...

- `Parameter#arg` to `Parameter#name`
- `ParameterWithDefault#def` to `ParameterWithDefault#parameter` (such
that `ParameterWithDefault` has a `default` and a `parameter`)

## Test Plan

`cargo test`
2023-08-01 14:28:34 -04:00
Charlie Marsh
adc8bb7821
Rename Arguments to Parameters in the AST (#6253)
## Summary

This PR renames a few AST nodes for clarity:

- `Arguments` is now `Parameters`
- `Arg` is now `Parameter`
- `ArgWithDefault` is now `ParameterWithDefault`

For now, the attribute names that reference `Parameters` directly are
changed (e.g., on `StmtFunctionDef`), but the attributes on `Parameters`
itself are not (e.g., `vararg`). We may revisit that decision in the
future.

For context, the AST node formerly known as `Arguments` is used in
function definitions. Formally (outside of the Python context),
"arguments" typically refers to "the values passed to a function", while
"parameters" typically refers to "the variables used in a function
definition". E.g., if you Google "arguments vs parameters", you'll get
some explanation like:

> A parameter is a variable in a function definition. It is a
placeholder and hence does not have a concrete value. An argument is a
value passed during function invocation.

We're thus deviating from Python's nomenclature in favor of a scheme
that we find to be more precise.
2023-08-01 13:53:28 -04:00
Micha Reiser
debfca3a11
Remove Parse trait (#6235) 2023-08-01 18:35:03 +02:00
David Szotten
ba990b676f
add DebugText for self-documenting f-strings (#6167) 2023-08-01 07:55:03 +02:00
Dhruv Manilawala
cb34e6d322
Avoid parenthesizing comprehension element (#6198)
## Summary

This PR adds a new precedence level for the comprehension element. This fixes
the generator to not add parentheses around the comprehension element every
time.

The new precedence level is `COMPREHENSION_ELEMENT` and it should occur after
the `NAMED_EXPR` precedence level because named expressions are always parenthesized.

This matches the behavior of Python `ast.unparse` and tested with the
following snippet:

```python
import ast

code = ""
ast.unparse(ast.parse(code))
```

## Test Plan

Add a bunch of test cases for all the valid nodes at that position.

fixes: #5777
2023-07-31 20:56:42 +05:30
Micha Reiser
40f54375cb
Pull in RustPython parser (#6099) 2023-07-27 09:29:11 +00:00
Micha Reiser
2cf00fee96
Remove parser dependency from ruff-python-ast (#6096) 2023-07-26 17:47:22 +02:00