Commit graph

253 commits

Author SHA1 Message Date
Alex Waygood
b6b1947010
Improve API exposed on ExprStringLiteral nodes (#16192)
## Summary

This PR makes the following changes:
- It adjusts various callsites to use the new
`ast::StringLiteral::contents_range()` method that was introduced in
https://github.com/astral-sh/ruff/pull/16183. This is less verbose and
more type-safe than using the `ast::str::raw_contents()` helper
function.
- It adds a new `ast::ExprStringLiteral::as_unconcatenated_literal()`
helper method, and adjusts various callsites to use it. This addresses
@MichaReiser's review comment at
https://github.com/astral-sh/ruff/pull/16183#discussion_r1957334365.
There is no functional change here, but it helps readability to make it
clearer that we're differentiating between implicitly concatenated
strings and unconcatenated strings at various points.
- It renames the `StringLiteralValue::flags()` method to
`StringLiteralFlags::first_literal_flags()`. If you're dealing with an
implicitly concatenated string `string_node`,
`string_node.value.flags().closer_len()` could give an incorrect result;
this renaming makes it clearer that the `StringLiteralFlags` instance
returned by the method is only guaranteed to give accurate information
for the first `StringLiteral` contained in the `ExprStringLiteral` node.
- It deletes the unused `BytesLiteralValue::flags()` method. This seems
prone to misuse in the same way as `StringLiteralValue::flags()`: if
it's an implicitly concatenated bytestring, the `BytesLiteralFlags`
instance returned by the method would only give accurate information for
the first `BytesLiteral` in the bytestring.

## Test Plan

`cargo test`
2025-02-17 07:58:54 +00:00
InSync
7d2e40be2d
[pylint] Do not offer fix for raw strings (PLE251) (#16132)
Some checks are pending
CI / cargo fmt (push) Waiting to run
CI / cargo build (release) (push) Waiting to run
CI / Determine changes (push) Waiting to run
CI / cargo clippy (push) Blocked by required conditions
CI / cargo test (linux) (push) Blocked by required conditions
CI / cargo test (linux, release) (push) Blocked by required conditions
CI / cargo test (windows) (push) Blocked by required conditions
CI / cargo test (wasm) (push) Blocked by required conditions
CI / cargo build (msrv) (push) Blocked by required conditions
CI / cargo fuzz build (push) Blocked by required conditions
CI / fuzz parser (push) Blocked by required conditions
CI / test scripts (push) Blocked by required conditions
CI / ecosystem (push) Blocked by required conditions
CI / cargo shear (push) Blocked by required conditions
CI / python package (push) Waiting to run
CI / pre-commit (push) Waiting to run
CI / mkdocs (push) Waiting to run
CI / formatter instabilities and black similarity (push) Blocked by required conditions
CI / test ruff-lsp (push) Blocked by required conditions
CI / benchmarks (push) Blocked by required conditions
## Summary

Resolves #13294, follow-up to #13882.

At #13882, it was concluded that a fix should not be offered for raw
strings. This change implements that. The five rules in question are now
no longer always fixable.

## Test Plan

`cargo nextest run` and `cargo insta test`.

---------

Co-authored-by: Micha Reiser <micha@reiser.io>
2025-02-13 08:36:11 +00:00
Alex Waygood
cb71393332
Simplify the StringFlags trait (#15944) 2025-02-04 18:14:28 +00:00
Brent Westbrook
b5e5271adf
Preserve triple quotes and prefixes for strings (#15818)
## Summary

This is a follow-up to #15726, #15778, and #15794 to preserve the triple
quote and prefix flags in plain strings, bytestrings, and f-strings.

I also added a `StringLiteralFlags::without_triple_quotes` method to
avoid passing along triple quotes in rules like SIM905 where it might
not make sense, as discussed
[here](https://github.com/astral-sh/ruff/pull/15726#discussion_r1930532426).

## Test Plan

Existing tests, plus many new cases in the `generator::tests::quote`
test that should cover all combinations of quotes and prefixes, at least
for simple string bodies.

Closes #7799 when combined with #15694, #15726, #15778, and #15794.

---------

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
2025-02-04 08:41:06 -05:00
Brent Westbrook
9bf138c45a
Preserve quote style in generated code (#15726)
## Summary

This is a first step toward fixing #7799 by using the quoting style
stored in the `flags` field on `ast::StringLiteral`s to select a quoting
style. This PR does not include support for f-strings or byte strings.

Several rules also needed small updates to pass along existing quoting
styles instead of using `StringLiteralFlags::default()`. The remaining
snapshot changes are intentional and should preserve the quotes from the
input strings.

## Test Plan

Existing tests with some accepted updates, plus a few new RUF055 tests
for raw strings.

---------

Co-authored-by: Alex Waygood <alex.waygood@gmail.com>
2025-01-27 13:41:03 -05:00
Shaygan Hooshyari
cf4ab7cba1
Parse triple quoted string annotations as if parenthesized (#15387)
## Summary

Resolves #9467 

Parse quoted annotations as if the string content is inside parenthesis.
With this logic `x` and `y` in this example are equal:

```python
y: """
   int |
   str
"""

z: """(
    int |
    str
)
"""
```

Also this rule only applies to triple
quotes([link](https://github.com/python/typing-council/issues/9#issuecomment-1890808610)).

This PR is based on the
[comments](https://github.com/astral-sh/ruff/issues/9467#issuecomment-2579180991)
on the issue.

I did one extra change, since we don't want any indentation tokens I am
setting the `State::Other` as the initial state of the Lexer.

Remaining work:

- [x] Add a test case for red-knot.
- [x] Add more tests.

## Test Plan

Added a test which previously failed because quoted annotation contained
indentation.
Added an mdtest for red-knot.
Updated previous test.

Co-authored-by: Dhruv Manilawala <dhruvmanila@gmail.com>
Co-authored-by: Micha Reiser <micha@reiser.io>
2025-01-16 11:38:15 +05:30
Andrew Gallant
17f01a4355 test: add more missing carets
This update includes some missing `^` in the diagnostic annotations.

This update also includes some shifting of "syntax error" annotations to
the end of the preceding line. I believe this is technically a
regression, but fixing them has proven quite difficult. I *think* the
best way to do that might be to tweak the spans generated by the Python
parser errors, but I didn't want to dig into that. (Another approach
would be to change the `annotate-snippets` rendering, but when I tried
that and managed to fix these regressions, I ended up causing a bunch of
other regressions.)

Ref 77d454525e (r1915458616)
2025-01-15 13:37:52 -05:00
Andrew Gallant
84ba4ecaf5 ruff_annotate_snippets: support overriding the "cut indicator"
We do this because `...` is valid Python, which makes it pretty likely
that some line trimming will lead to ambiguous output. So we add support
for overriding the cut indicator. This also requires changing some of
the alignment math, which was previously tightly coupled to `...`.

For Ruff, we go with `…` (`U+2026 HORIZONTAL ELLIPSIS`) for our cut
indicator.

For more details, see the patch sent to upstream:
https://github.com/rust-lang/annotate-snippets-rs/pull/172
2025-01-15 13:37:52 -05:00
Andrew Gallant
5caef89af3 test: update snapshots with improper end-of-line placement
This looks like a bug fix that occurs when the annotation is a
zero-width span immediately following a line terminator. Previously, the
caret seems to be rendered on the next line, but it should be rendered
at the end of the line the span corresponds to.

I admit that this one is kinda weird. I would somewhat expect that our
spans here are actually incorrect, and that to obtain this sort of
rendering, we should identify a span just immediately _before_ the line
terminator and not after it. But I don't want to dive into that rabbit
hole for now (and given how `annotate-snippets` now renders these
spans, perhaps there is more to it than I see), and this does seem like
a clear improvement given the spans we feed to `annotate-snippets`.
2025-01-15 13:37:52 -05:00
Andrew Gallant
f49cfb6c28 test: update snapshots with missing ^
The previous rendering just seems wrong in that a `^` is omitted. The
new version of `annotate-snippets` seems to get this right. I checked a
pseudo random sample of these, and it seems to only happen when the
position pointed at a line terminator.
2025-01-15 13:37:52 -05:00
Andrew Gallant
3fa4479c85 test: update snapshots with missing annotations
These updates center around the addition of annotations in the
diagnostic rendering. Previously, the annotation was just not rendered
at all. With the `annotate-snippets` upgrade, it is now rendered. I
examined a pseudo random sample of these, and they all look correct.

As will be true in future batches, some of these snapshots also have
changes to whitespace in them as well.
2025-01-15 13:37:52 -05:00
Andrew Gallant
0de8216a25 test: update snapshots with just whitespace changes
These snapshot changes should *all* only be a result of changes to
trailing whitespace in the output. I checked a psuedo random sample of
these, and the whitespace found in the previous snapshots seems to be an
artifact of the rendering and _not_ of the source data. So this seems
like a strict bug fix to me.

There are other snapshots with whitespace changes, but they also have
other changes that we split out into separate commits. Basically, we're
going to do approximately one commit per category of change.

This represents, by far, the biggest chunk of changes to snapshots as a
result of the `annotate-snippets` upgrade.
2025-01-15 13:37:52 -05:00
Andrew Gallant
84179aaa96 ruff_linter,ruff_python_parser: migrate to updated annotate-snippets
This is pretty much just moving to the new API and taking care to use
byte offsets. This is *almost* enough. The next commit will fix a bug
involving the handling of unprintable characters as a result of
switching to byte offsets.
2025-01-15 13:37:52 -05:00
Dylan
c1eaf6ff72
Modify parsing of raise with cause when exception is absent (#15049)
When confronted with `raise from exc` the parser will now create a
`StmtRaise` that has `None` for the exception and `exc` for the cause.

Before, the parser created a `StmtRaise` with `from` for the exception,
no cause, and a spurious expression `exc` afterwards.
2024-12-19 13:36:32 +00:00
Dylan
a3bb0cd5ec
Raise syntax error for mixing except and except* (#14895)
Some checks are pending
CI / Determine changes (push) Waiting to run
CI / cargo fmt (push) Waiting to run
CI / cargo clippy (push) Blocked by required conditions
CI / cargo test (linux) (push) Blocked by required conditions
CI / cargo test (linux, release) (push) Blocked by required conditions
CI / cargo test (windows) (push) Blocked by required conditions
CI / cargo test (wasm) (push) Blocked by required conditions
CI / cargo build (release) (push) Waiting to run
CI / cargo build (msrv) (push) Blocked by required conditions
CI / cargo fuzz build (push) Blocked by required conditions
CI / fuzz parser (push) Blocked by required conditions
CI / test scripts (push) Blocked by required conditions
CI / ecosystem (push) Blocked by required conditions
CI / cargo shear (push) Blocked by required conditions
CI / python package (push) Waiting to run
CI / pre-commit (push) Waiting to run
CI / mkdocs (push) Waiting to run
CI / formatter instabilities and black similarity (push) Blocked by required conditions
CI / test ruff-lsp (push) Blocked by required conditions
CI / benchmarks (push) Blocked by required conditions
This PR adds a syntax error if the parser encounters a `TryStmt` that
has except clauses both with and without a star.

The displayed error points to each except clause that contradicts the
original except clause kind. So, for example,

```python
try:
    ....
except:     #<-- we assume this is the desired except kind
    ....
except*:    #<---  error will point here
    ....
except*:    #<--- and here
    ....
```

Closes #14860
2024-12-10 17:50:55 -06:00
Dimitri Papadopoulos Orfanos
59145098d6
Fix typos found by codespell (#14863)
## Summary

Just fix typos.

## Test Plan

CI tests.

---------

Co-authored-by: Micha Reiser <micha@reiser.io>
2024-12-09 09:32:12 +00:00
Micha Reiser
b63c2e126b
Upgrade Rust toolchain to 1.83 (#14677) 2024-11-29 12:05:05 +00:00
Alex Waygood
f1b2e85339
py-fuzzer: recommend using uvx rather than uv run to run the fuzzer (#14645)
Some checks are pending
CI / Determine changes (push) Waiting to run
CI / cargo fmt (push) Waiting to run
CI / cargo clippy (push) Blocked by required conditions
CI / cargo test (linux) (push) Blocked by required conditions
CI / cargo test (linux, release) (push) Blocked by required conditions
CI / cargo test (windows) (push) Blocked by required conditions
CI / cargo test (wasm) (push) Blocked by required conditions
CI / cargo build (release) (push) Waiting to run
CI / cargo build (msrv) (push) Blocked by required conditions
CI / cargo fuzz build (push) Blocked by required conditions
CI / fuzz parser (push) Blocked by required conditions
CI / test scripts (push) Blocked by required conditions
CI / ecosystem (push) Blocked by required conditions
CI / cargo shear (push) Blocked by required conditions
CI / python package (push) Waiting to run
CI / pre-commit (push) Waiting to run
CI / mkdocs (push) Waiting to run
CI / formatter instabilities and black similarity (push) Blocked by required conditions
CI / test ruff-lsp (push) Blocked by required conditions
CI / benchmarks (push) Blocked by required conditions
2024-11-27 22:19:52 +00:00
Alex Waygood
e0f3eaf1dd
Turn the fuzz-parser script into a properly packaged Python project (#14606)
## Summary

This PR gets rid of the `requirements.in` and `requirements.txt` files
in the `scripts/fuzz-parser` directory, and replaces them with
`pyproject.toml` and `uv.lock` files. The script is renamed from
`fuzz-parser` to `py-fuzzer` (since it can now also be used to fuzz
red-knot as well as the parser, following
https://github.com/astral-sh/ruff/pull/14566), and moved from the
`scripts/` directory to the `python/` directory, since it's now a
(uv)-pip-installable project in its own right.

I've been resisting this for a while, because conceptually this script
just doesn't feel "complicated" enough to me for it to be a full-blown
package. However, I think it's time to do this. Making it a proper
package has several advantages:
- It means we can run it from the project root using `uv run` without
having to activate a virtual environment and ensure that all required
dependencies are installed into that environment
- Using a `pyproject.toml` file means that we can express that the
project requires Python 3.12+ to run properly; this wasn't possible
before
- I've been running mypy on the project locally when I've been working
on it or reviewing other people's PRs; now I can put the mypy config for
the project in the `pyproject.toml` file

## Test Plan

I manually tested that all the commands detailed in
`python/py-fuzzer/README.md` work for me locally.

---------

Co-authored-by: David Peter <sharkdp@users.noreply.github.com>
2024-11-27 08:09:04 +00:00
Micha Reiser
c847cad389
Update insta snapshots (#14366)
Some checks are pending
CI / Determine changes (push) Waiting to run
CI / cargo fmt (push) Waiting to run
CI / cargo shear (push) Blocked by required conditions
CI / cargo clippy (push) Blocked by required conditions
CI / cargo test (linux) (push) Blocked by required conditions
CI / cargo test (windows) (push) Blocked by required conditions
CI / cargo test (wasm) (push) Blocked by required conditions
CI / cargo build (release) (push) Blocked by required conditions
CI / cargo build (msrv) (push) Blocked by required conditions
CI / cargo fuzz (push) Blocked by required conditions
CI / Fuzz the parser (push) Blocked by required conditions
CI / test scripts (push) Blocked by required conditions
CI / ecosystem (push) Blocked by required conditions
CI / python package (push) Waiting to run
CI / pre-commit (push) Waiting to run
CI / mkdocs (push) Waiting to run
CI / formatter instabilities and black similarity (push) Blocked by required conditions
CI / test ruff-lsp (push) Blocked by required conditions
CI / benchmarks (push) Blocked by required conditions
2024-11-15 19:31:15 +01:00
Micha Reiser
bd33b4972d
Short circuit lex_identifier if the name is longer or shorter than any known keyword (#13815) 2024-10-19 11:07:15 +00:00
Junzhuo ZHOU
a354d9ead6
Expose internal types as public access (#13509) 2024-09-26 17:34:30 +02:00
Micha Reiser
c3bcd5c842
Upgrade to Rust 1.81 (#13265) 2024-09-06 15:09:09 +02:00
Alex Waygood
b7c7b4b387
Add a method to Checker for cached parsing of stringified type annotations (#13158) 2024-09-02 12:44:20 +00:00
Micha Reiser
138e70bd5c
Upgrade to Rust 1.80 (#12586) 2024-07-30 19:18:08 +00:00
Dhruv Manilawala
978909fcf4
Raise syntax error for unparenthesized generator expr in multi-argument call (#12445)
## Summary

This PR fixes a bug to raise a syntax error when an unparenthesized
generator expression is used as an argument to a call when there are
more than one argument.

For reference, the grammar is:
```
primary:
    | ...
    | primary genexp 
    | primary '(' [arguments] ')' 
    | ...

genexp:
    | '(' ( assignment_expression | expression !':=') for_if_clauses ')' 
```

The `genexp` requires the parenthesis as mentioned in the grammar. So,
the grammar for a call expression is either a name followed by a
generator expression or a name followed by a list of argument. In the
former case, the parenthesis are excluded because the generator
expression provides them while in the later case, the parenthesis are
explicitly provided for a list of arguments which means that the
generator expression requires it's own parenthesis.

This was discovered in https://github.com/astral-sh/ruff/issues/12420.

## Test Plan

Add test cases for valid and invalid syntax.

Make sure that the parser from CPython also raises this at the parsing
step:
```console
$ python3.13 -m ast parser/_.py
  File "parser/_.py", line 1
    total(1, 2, x for x in range(5), 6)
                ^^^^^^^^^^^^^^^^^^^
SyntaxError: Generator expression must be parenthesized

$ python3.13 -m ast parser/_.py
  File "parser/_.py", line 1
    sum(x for x in range(10), 10)
        ^^^^^^^^^^^^^^^^^^^^
SyntaxError: Generator expression must be parenthesized
```
2024-07-22 14:44:20 +05:30
Dhruv Manilawala
8f40928534
Enable token-based rules on source with syntax errors (#11950)
## Summary

This PR updates the linter, specifically the token-based rules, to work
on the tokens that come after a syntax error.

For context, the token-based rules only diagnose the tokens up to the
first lexical error. This PR builds up an error resilience by
introducing a `TokenIterWithContext` which updates the `nesting` level
and tries to reflect it with what the lexer is seeing. This isn't 100%
accurate because if the parser recovered from an unclosed parenthesis in
the middle of the line, the context won't reduce the nesting level until
it sees the newline token at the end of the line.

resolves: #11915

## Test Plan

* Add test cases for a bunch of rules that are affected by this change.
* Run the fuzzer for a long time, making sure to fix any other bugs.
2024-07-02 08:57:46 +00:00
Micha Reiser
5109b50bb3
Use CompactString for Identifier (#12101) 2024-07-01 10:06:02 +02:00
Micha Reiser
f765d19402
Mention that Cursor is based on rustc's implementation. (#12109) 2024-06-30 16:53:25 +01:00
Micha Reiser
da78de0439
Remove allcation in parse_identifier (#12103) 2024-06-29 15:00:24 +02:00
Dhruv Manilawala
434ce307a7
Revert "Use correct range to highlight line continuation error" (#12089)
This PR reverts https://github.com/astral-sh/ruff/pull/12016 with a
small change where the error location points to the continuation
character only. Earlier, it would also highlight the whitespace that
came before it.

The motivation for this change is to avoid panic in
https://github.com/astral-sh/ruff/pull/11950. For example:

```py
\)
```

Playground: https://play.ruff.rs/87711071-1b54-45a3-b45a-81a336a1ea61

The range of `Unknown` token and `Rpar` is the same. Once #11950 is
enabled, the indexer would panic. It won't panic in the stable version
because we stop at the first `Unknown` token.
2024-06-28 18:10:00 +05:30
Dhruv Manilawala
a4688aebe9
Use TokenSource to find new location for re-lexing (#12060)
## Summary

This PR splits the re-lexing logic into two parts:
1. `TokenSource`: The token source will be responsible to find the
position the lexer needs to be moved to
2. `Lexer`: The lexer will be responsible to reduce the nesting level
and move itself to the new position if recovered from a parenthesized
context

This split makes it easy to find the new lexer position without needing
to implement the backwards lexing logic again which would need to handle
cases involving:
* Different kinds of newlines
* Line continuation character(s)
* Comments
* Whitespaces

### F-strings

This change did reveal one thing about re-lexing f-strings. Consider the
following example:
```py
f'{'
#  ^
f'foo'
```

Here, the quote as highlighted by the caret (`^`) is the start of a
string inside an f-string expression. This is unterminated string which
means the token emitted is actually `Unknown`. The parser tries to
recover from it but there's no newline token in the vector so the new
logic doesn't recover from it. The previous logic does recover because
it's looking at the raw characters instead.

The parser would be at `FStringStart` (the one for the second line) when
it calls into the re-lexing logic to recover from an unterminated
f-string on the first line. So, moving backwards the first character
encountered is a newline character but the first token encountered is an
`Unknown` token.

This is improved with #12067 

fixes: #12046 
fixes: #12036

## Test Plan

Update the snapshot and validate the changes.
2024-06-27 17:12:39 +05:30
Dhruv Manilawala
e137c824c3
Avoid consuming newline for unterminated string (#12067)
## Summary

This PR fixes the lexer logic to **not** consume the newline character
for an unterminated string literal.

Currently, the lexer would consume it to be part of the string itself
but that would be bad for recovery because then the lexer wouldn't emit
the newline token ever. This PR fixes that to avoid consuming the
newline character in that case.

This was discovered during https://github.com/astral-sh/ruff/pull/12060.

## Test Plan

Update the snapshots and validate them.
2024-06-27 17:02:48 +05:30
Dhruv Manilawala
47c9ed07f2
Consider 2-character EOL before line continuation (#12035)
## Summary

This PR fixes a bug introduced in
https://github.com/astral-sh/ruff/pull/12008 which didn't consider the
two character newline after the line continuation character.

For example, consider the following code highlighted with whitespaces:
```py
call(foo # comment \\r\n
\r\n
def bar():\r\n
....pass\r\n
```
The lexer is at `def` when it's running the re-lexing logic and trying
to move back to a newline character. It encounters `\n` and it's being
escaped (incorrect) but `\r` is being escaped, so it moves the lexer to
`\n` character. This creates an overlap in token ranges which causes the
panic.

```
Name 0..4
Lpar 4..5
Name 5..8
Comment 9..20
NonLogicalNewline 20..22 <-- overlap between
Newline 21..22           <-- these two tokens
NonLogicalNewline 22..23
Def 23..26
...
```

fixes: #12028 

## Test Plan

Add a test case with line continuation and windows style newline
character.
2024-06-26 14:00:48 +05:30
Dhruv Manilawala
7cb2619ef5
Add syntax error for empty type parameter list (#12030)
## Summary

(I'm pretty sure I added this in the parser re-write but must've got
lost in the rebase?)

This PR raises a syntax error if the type parameter list is empty.

As per the grammar, there should be at least one type parameter:
```
type_params: 
    | invalid_type_params
    | '[' type_param_seq ']' 

type_param_seq: ','.type_param+ [','] 
```

Verified via the builtin `ast` module as well:
```console    
$ python3.13 -m ast parser/_.py
Traceback (most recent call last):
  [..]
  File "parser/_.py", line 1
    def foo[]():
            ^
SyntaxError: Type parameter list cannot be empty
```

## Test Plan

Add inline test cases and update the snapshots.
2024-06-26 08:10:35 +05:30
Dhruv Manilawala
7109214b57
Update parser tests to validate token ranges (#12019)
## Summary

This PR updates the parser test infrastructure to validate the token
ranges.

From the code documentation:
```
/// Verifies that:
/// * the ranges are strictly increasing when loop the tokens in insertion order
/// * all ranges are within the length of the source code
```

Follow-up from #12016 and #12017
resolves: #11938

## Test Plan

Make sure that there are no failures.
2024-06-25 08:14:28 +00:00
Dhruv Manilawala
d930e97212
Do not include newline for unterminated string range (#12017)
## Summary

This PR updates the unterminated string error range to not include the
final newline character.

This is a follow-up to #12016 and required for #12019

This is not done for when the unterminated string goes till the end of
file (not a newline character). The unterminated f-string range is
correct.

### Why is this required for #12019 ?

Because otherwise the token ranges will overlap. For example:
```py
f"{"
f"{foo!r"
```

Here, the re-lexing logic recovers from an unterminated f-string and
thus emitting a `Newline` token for the one at the end of the first
line. But, currently the `Unknown` and the `Newline` token would overlap
because the `Unknown` token (unterminated string literal) range would
include the newline character.

## Test Plan

Update and validate the snapshot.
2024-06-25 08:10:07 +00:00
Dhruv Manilawala
9c1b6ec411
Use correct range to highlight line continuation error (#12016)
## Summary

This PR fixes the range highlighted for the line continuation error.

Previously, it would highlight an incorrect range:
```
1 | call(a, b, \\\
  |           ^^ Syntax Error: unexpected character after line continuation character
2 | 
3 | def bar():
  |
```

And now:
```
  |
1 | call(a, b, \\\
  |             ^ Syntax Error: unexpected character after line continuation character
2 | 
3 | def bar():
  |
```

This is implemented by avoiding to update the token range for the
`Unknown` token which is emitted when there's a lexical error. Instead,
the `push_error` helper method will be responsible to update the range
to the error location.

This actually becomes a requirement which can be seen in follow-up PRs.

## Test Plan

Update and validate the snapshot.
2024-06-25 13:35:24 +05:30
Dhruv Manilawala
68a8978454
Consider line continuation character for re-lexing (#12008)
## Summary

This PR fixes a bug where the re-lexing logic didn't consider the line
continuation character being present before the newline character. This
meant that the lexer was being moved back to the newline character which
is actually ignored via `\`.

Considering the following code:
```py
f'middle {'string':\
        'format spec'}

```

The old token stream is:
```
...
Colon 18..19
FStringMiddle 19..29 (flags = F_STRING)
Newline 20..21
Indent 21..29
String 29..42
Rbrace 42..43
...
```

Notice how the ranges are overlapping between the `FStringMiddle` token
and the tokens emitted after moving the lexer backwards.

After this fix, the new token stream which is without moving the lexer
backwards in this scenario:
```
FStringStart 0..2 (flags = F_STRING)
FStringMiddle 2..9 (flags = F_STRING)
Lbrace 9..10
String 10..18
Colon 18..19
FStringMiddle 19..29 (flags = F_STRING)
FStringEnd 29..30 (flags = F_STRING)
Name 30..36
Name 37..41
Unknown 41..44
Newline 44..45
```

fixes: #12004 

## Test Plan

Add test cases and update the snapshots.
2024-06-25 02:13:54 +00:00
renovate[bot]
53a80a5c11
Update Rust crate rustc-hash to v2 (#12001) 2024-06-23 20:46:42 -04:00
Dhruv Manilawala
81160320de
Manual impl of Debug on Token (#11958)
## Summary

I look at the token stream a lot, not specifically in the playground but
in the terminal output and it's annoying to scroll a lot to find
specific location. Most of the information is also redundant.

The final format we end up with is: `<kind> <range> (flags = ...)` e.g.,
`String 0..4 (flags = BYTE_STRING)` where the flags part is only
populated if there are any flags set.
2024-06-22 04:18:24 +00:00
Dhruv Manilawala
27ebff36ec
Remove Token::is_trivia method (#11962)
Sorry, a leftover from my rebase
2024-06-21 10:24:42 +00:00
Dhruv Manilawala
96da136e6a
Move token and error structs into related modules (#11957)
## Summary

This PR does some housekeeping into moving certain structs into related
modules. Specifically,
1. Move `LexicalError` from `lexer.rs` to `error.rs` which also contains
the `ParseError`
2. Move `Token`, `TokenFlags` and `TokenValue` from `lexer.rs` to
`token.rs`
2024-06-21 10:07:19 +00:00
Dhruv Manilawala
4667d8697c
Remove duplication around is_trivia functions (#11956)
## Summary

This PR removes the duplication around `is_trivia` functions.

There are two of them in the codebase:
1. In `pycodestyle`, it's for newline, indent, dedent, non-logical
newline and comment
2. In the parser, it's for non-logical newline and comment

The `TokenKind::is_trivia` method used (1) but that's not correct in
that context. So, this PR introduces a new `is_non_logical_token` helper
method for the `pycodestyle` crate and updates the
`TokenKind::is_trivia` implementation with (2).

This also means we can remove `Token::is_trivia` method and the
standalone `token_source::is_trivia` function and use the one on
`TokenKind`.

## Test Plan

`cargo insta test`
2024-06-21 10:02:40 +00:00
Dhruv Manilawala
ed948eaefb
Avoid moving back the lexer for triple-quoted fstring (#11939)
## Summary

This PR avoids moving back the lexer for a triple-quoted f-string during
the re-lexing phase.

The reason this is a problem is that for a triple-quoted f-string the
newlines are part of the f-string itself, specifically they'll be part
of the `FStringMiddle` token. So, if we moved the lexer back, there
would be a `Newline` token whose range would be in between an
`FStringMiddle` token. This creates a panic in downstream usage.

fixes: #11937 

## Test Plan

Add test cases and validate the snapshots.
2024-06-20 16:27:36 +05:30
Dhruv Manilawala
b617d90651
Update E999 to show all syntax errors (#11900)
## Summary

This PR updates the linter to show all the parse errors as diagnostics
instead of just the first one.

Note that this doesn't affect the parse error displayed as error log
message. This will be removed in a follow-up PR.

### Breaking?

I don't think this is a breaking change even though this might give more
diagnostics. The main reason is that this shouldn't affect any users
because it'll only give additional diagnostics in the case of multiple
syntax errors.

## Test Plan

Add an integration test case which would raise more than one parse
error.
2024-06-19 13:09:54 +05:30
Dhruv Manilawala
cdc7c71449
Avoid consuming trailing whitespace during re-lexing (#11933)
## Summary

This PR updates the re-lexing logic to avoid consuming the trailing
whitespace and move the lexer explicitly to the last newline character
encountered while moving backwards.

Consider the following code snippet as taken from the test case
highlighted with whitespace (`.`) and newline (`\n`) characters:
```py
# There are trailing whitespace before the newline character but those whitespaces are
# part of the comment token
f"""hello {x # comment....\n
#                     ^
y = 1\n
```

The parser is at `y` when it's trying to recover from an unclosed `{`,
so it calls into the re-lexing logic which tries to move the lexer back
to the end of the previous line. But, as it consumed all whitespaces it
moved the lexer to the location marked by `^` in the above code snippet.
But, those whitespaces are part of the comment token. This means that
the range for the two tokens were overlapping which introduced the
panic.

Note that this is only a bug when there's a comment with a trailing
whitespace otherwise it's fine to move the lexer to the whitespace
character. This is because the lexer would just skip the whitespace
otherwise. Nevertheless, this PR updates the logic to move it explicitly
to the newline character in all cases.

fixes: #11929 

## Test Plan

Add test cases and update the snapshot. Make sure that it doesn't panic
on the code snippet in the linked issue.
2024-06-19 12:14:18 +05:30
Dhruv Manilawala
1e0642fac8
Use re-lexing for normal list parsing (#11871)
## Summary

This PR is a follow-up on #11845 to add the re-lexing logic for normal
list parsing.

A normal list parsing is basically parsing elements without any
separator in between i.e., there can only be trivia tokens in between
the two elements. Currently, this is only being used for parsing
**assignment statement** and **f-string elements**. Assignment
statements cannot be in a parenthesized context, but f-string can have
curly braces so this PR is specifically for them.

I don't think this is an ideal recovery but the problem is that both
lexer and parser could add an error for f-strings. If the lexer adds an
error it'll emit an `Unknown` token instead while the parser adds the
error directly. I think we'd need to move all f-string errors to be
emitted by the parser instead. This way the parser can correctly inform
the lexer that it's out of an f-string and then the lexer can pop the
current f-string context out of the stack.

## Test Plan

Add test cases, update the snapshots, and run the fuzzer.
2024-06-18 12:14:41 +05:30
Dhruv Manilawala
8499abfa7f
Implement re-lexing logic for better error recovery (#11845)
## Summary

This PR implements the re-lexing logic in the parser.

This logic is only applied when recovering from an error during list
parsing. The logic is as follows:
1. During list parsing, if an unexpected token is encountered and it
detects that an outer context can understand it and thus recover from
it, it invokes the re-lexing logic in the lexer
2. This logic first checks if the lexer is in a parenthesized context
and returns if it's not. Thus, the logic is a no-op if the lexer isn't
in a parenthesized context
3. It then reduces the nesting level by 1. It shouldn't reset it to 0
because otherwise the recovery from nested list parsing will be
incorrect
4. Then, it tries to find last newline character going backwards from
the current position of the lexer. This avoids any whitespaces but if it
encounters any character other than newline or whitespace, it aborts.
5. Now, if there's a newline character, then it needs to be re-lexed in
a logical context which means that the lexer needs to emit it as a
`Newline` token instead of `NonLogicalNewline`.
6. If the re-lexing gives a different token than the current one, the
token source needs to update it's token collection to remove all the
tokens which comes after the new current position.

It turns out that the list parsing isn't that happy with the results so
it requires some re-arranging such that the following two errors are
raised correctly:
1. Expected comma
2. Recovery context error

For (1), the following scenarios needs to be considered:
* Missing comma between two elements
* Half parsed element because the grammar doesn't allow it (for example,
named expressions)

For (2), the following scenarios needs to be considered:
1. If the parser is at a comma which means that there's a missing
element otherwise the comma would've been consumed by the first `eat`
call above. And, the parser doesn't take the re-lexing route on a comma
token.
2. If it's the first element and the current token is not a comma which
means that it's an invalid element.

resolves: #11640 

## Test Plan

- [x] Update existing test snapshots and validate them
- [x] Add additional test cases specific to the re-lexing logic and
validate the snapshots
- [x] Run the fuzzer on 3000+ valid inputs
- [x] Run the fuzzer on invalid inputs
- [x] Run the parser on various open source projects
- [x] Make sure the ecosystem changes are none
2024-06-17 06:47:00 +00:00
Micha Reiser
d4dd96d1f4
red-knot: source_text, line_index, and parsed_module queries (#11822) 2024-06-13 07:37:02 +00:00