## Summary
I always found it odd that we had to pass this in, since it's really
higher-level context for the error. The awkwardness is further evidenced
by the fact that we pass in fake values everywhere (even outside of
tests). The source path isn't actually used to display the error; it's
only accessed elsewhere to _re-display_ the error in certain cases. This
PR modifies to instead pass the path directly in those cases.
## Summary
This helps a bit with (but does not close) the issues described in
https://github.com/astral-sh/ruff/issues/9311. E.g., now, we at least
see: `error: Failed to format main.py: source contains syntax errors:
invalid syntax. Got unexpected token '=' at byte offset 20`.
<!--
Thank you for contributing to Ruff! To help us out with reviewing,
please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
Our `SoftKeywordTokenizer` only respected soft keywords in compound
statement positions -- for example, at the start of a logical line:
```python
type X = int
```
However, type aliases can also appear in simple statement positions,
like:
```python
class Class: type X = int
```
(Note that `match` and `case` are _not_ valid keywords in such
positions.)
This PR upgrades the tokenizer to track both kinds of valid positions.
Closes https://github.com/astral-sh/ruff/issues/8900.
Closes https://github.com/astral-sh/ruff/issues/8899.
## Test Plan
`cargo test`
Update to [Rust
1.74](https://blog.rust-lang.org/2023/11/16/Rust-1.74.0.html) and use
the new clippy lints table.
The update itself introduced a new clippy lint about superfluous hashes
in raw strings, which got removed.
I moved our lint config from `rustflags` to the newly stabilized
[workspace.lints](https://doc.rust-lang.org/stable/cargo/reference/workspaces.html#the-lints-table).
One consequence is that we have to `unsafe_code = "warn"` instead of
"forbid" because the latter now actually bans unsafe code:
```
error[E0453]: allow(unsafe_code) incompatible with previous forbid
--> crates/ruff_source_file/src/newlines.rs:62:17
|
62 | #[allow(unsafe_code)]
| ^^^^^^^^^^^ overruled by previous forbid
|
= note: `forbid` lint level was set on command line
```
---------
Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
## Summary
This was just a bug in the parser ranges, probably since it was
initially implemented. Given `match n % 3, n % 5: ...`, the "subject"
(i.e., the tuple of two binary operators) was using the entire range of
the `match` statement.
Closes https://github.com/astral-sh/ruff/issues/8091.
## Test Plan
`cargo test`
## Summary
This PR fixes a bug to disallow f-strings in match pattern literal.
```
literal_pattern ::= signed_number
| signed_number "+" NUMBER
| signed_number "-" NUMBER
| strings
| "None"
| "True"
| "False"
| signed_number: NUMBER | "-" NUMBER
```
Source:
https://docs.python.org/3/reference/compound_stmts.html#grammar-token-python-grammar-literal_pattern
Also,
```console
$ python /tmp/t.py
File "/tmp/t.py", line 4
case "hello " f"{name}":
^^^^^^^^^^^^^^^^^^
SyntaxError: patterns may only match literals and attribute lookups
```
## Test Plan
Update existing test case and accordingly the snapshots. Also, add a new
test case to verify that the parser does raise an error.
I got confused and refactored a bit, now the naming should be more
consistent. This is the basis for the range formatting work.
Chages:
* `format_module` -> `format_module_source` (format a string)
* `format_node` -> `format_module_ast` (format a program parsed into an
AST)
* Added `parse_ok_tokens` that takes `Token` instead of `Result<Token>`
* Call the source code `source` consistently
* Added a `tokens_and_ranges` helper
* `python_ast` -> `module` (because that's the type)
## Summary
This PR attempts to address a problem in the parser related to the
range's of `WithItem` nodes in certain contexts -- specifically,
`WithItem` nodes in parentheses that do not have an `as` token after
them.
For example,
[here](https://play.ruff.rs/71be2d0b-2a04-4c7e-9082-e72bff152679):
```python
with (a, b):
pass
```
The range of the `WithItem` `a` is set to the range of `(a, b)`, as is
the range of the `WithItem` `b`. In other words, when we have this kind
of sequence, we use the range of the entire parenthesized context,
rather than the ranges of the items themselves.
Note that this also applies to cases
[like](https://play.ruff.rs/c551e8e9-c3db-4b74-8cc6-7c4e3bf3713a):
```python
with (a, b, c as d):
pass
```
You can see the issue in the parser here:
```rust
#[inline]
WithItemsNoAs: Vec<ast::WithItem> = {
<location:@L> <all:OneOrMore<Test<"all">>> <end_location:@R> => {
all.into_iter().map(|context_expr| ast::WithItem { context_expr, optional_vars: None, range: (location..end_location).into() }).collect()
},
}
```
Fixing this issue is... very tricky. The naive approach is to use the
range of the `context_expr` as the range for the `WithItem`, but that
range will be incorrect when the `context_expr` is itself parenthesized.
For example, _that_ solution would fail here, since the range of the
first `WithItem` would be that of `a`, rather than `(a)`:
```python
with ((a), b):
pass
```
The `with` parsing in general is highly precarious due to ambiguities in
the grammar. Changing it in _any_ way seems to lead to an ambiguous
grammar that LALRPOP fails to translate. Consensus seems to be that we
don't really understand _why_ the current grammar works (i.e., _how_ it
avoids these ambiguities as-is).
The solution implemented here is to avoid changing the grammar itself,
and instead change the shape of the nodes returned by various rules in
the grammar. Specifically, everywhere that we return `Expr`, we instead
return `ParenthesizedExpr`, which includes a parenthesized range and the
underlying `Expr` itself. (If an `Expr` isn't parenthesized, the ranges
will be equivalent.) In `WithItemsNoAs`, we can then use the
parenthesized range as the range for the `WithItem`.
## Summary
Given:
```python
if (
x
:=
( # 4
y # 5
) # 6
):
pass
```
It turns out the parser ended the range of the `NamedExpr` at the end of
`y`, rather than the end of the parenthesis that encloses `y`. This just
seems like a bug -- the range should be from the start of the name on
the left, to the end of the parenthesized node on the right.
## Test Plan
`cargo test`
## Summary
This PR renames the `MagicCommand` token to `IpyEscapeCommand` token and
`MagicKind` to `IpyEscapeKind` type to better reflect the purpose of the
token and type. Similarly, it renames the AST nodes from `LineMagic` to
`IpyEscapeCommand` prefixed with `Stmt`/`Expr` wherever necessary.
It also makes renames from using `jupyter_magic` to
`ipython_escape_commands` in various function names.
The mode value is still `Mode::Jupyter` because the escape commands are
part of the IPython syntax but the lexing/parsing is done for a Jupyter
notebook.
### Motivation behind the rename:
* IPython codebase defines it as "EscapeCommand" / "Escape Sequences":
* Escape Sequences:
292e3a2345/IPython/core/inputtransformer2.py (L329-L333)
* Escape command:
292e3a2345/IPython/core/inputtransformer2.py (L410-L411)
* The word "magic" is used mainly for the actual magic commands i.e.,
the ones starting with `%`/`%%`
(https://ipython.readthedocs.io/en/stable/interactive/reference.html#magic-command-system).
So, this avoids any confusion between the Magic token (`%`, `%%`) and
the escape command itself.
## Test Plan
* `cargo test` to make sure all renames are done correctly.
* `grep` for `jupyter_escape`/`magic` to make sure all renames are done
correctly.
## Summary
This PR adds support for a stricter version of help end escape
commands[^1] in the parser. By stricter, I mean that the escape tokens
are only at the end of the command and there are no tokens at the start.
This makes it difficult to implement it in the lexer without having to
do a lot of look aheads or keeping track of previous tokens.
Now, as we're adding this in the parser, the lexer needs to recognize
and emit a new token for `?`. So, `Question` token is added which will
be recognized only in `Jupyter` mode.
The conditions applied are the same as the ones in the original
implementation in IPython codebase (which is a regex):
* There can only be either 1 or 2 question mark(s) at the end
* The node before the question mark can be a `Name`, `Attribute`,
`Subscript` (only with integer constants in slice position), or any
combination of the 3 nodes.
## Test Plan
Added test cases for various combination of the possible nodes in the
command value position and update the snapshots.
fixes: #6359fixes: #5030 (This is the final piece)
[^1]: https://github.com/astral-sh/ruff/pull/6272#issue-1833094281
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
This PR removes the `Interactive` and `FunctionType` parser modes that are unused by ruff
<!-- What's the purpose of the change? What does it do, and why? -->
## Test Plan
`cargo test`
<!-- How was it tested? -->