Commit graph

72 commits

Author SHA1 Message Date
Brent Westbrook
ab1011ce70
[syntax-errors] Single starred assignment target (#17024)
Summary
--

Detects starred assignment targets outside of tuples and lists like `*a
= (1,)`.

This PR only considers assignment statements. I also checked annotated
assigment statements, but these give a separate error that we already
catch, so I think they're okay not to consider:

```pycon
>>> *a: list[int] = []
  File "<python-input-72>", line 1
    *a: list[int] = []
      ^
SyntaxError: invalid syntax
```

Fixes #13759

Test Plan
--

New inline tests, plus a new `SemanticSyntaxError` for an existing
parser test. I also removed a now-invalid case from an otherwise-valid
test fixture.

The new semantic error leads to two errors for the case below:

```python
*foo() = 42
```

but this matches [pyright] too.

[pyright]: https://pyright-play.net/?code=FQMw9mAUCUAEC8sAsAmAUEA
2025-03-29 12:35:47 -04:00
Brent Westbrook
a0819f0c51
[syntax-errors] Store to or delete __debug__ (#16984)
Summary
--

Detect setting or deleting `__debug__`. Assigning to `__debug__` was a
`SyntaxError` on the earliest version I tested (3.8). Deleting
`__debug__` was made a `SyntaxError` in [BPO 45000], which said it was
resolved in Python 3.10. However, `del __debug__` was also a runtime
error (`NameError`) when I tested in Python 3.9.6, so I thought it was
worth including 3.9 in this check.

I don't think it was ever a *good* idea to try `del __debug__`, so I
think there's also an argument for not making this version-dependent at
all. That would only simplify the implementation very slightly, though.

[BPO 45000]: https://github.com/python/cpython/issues/89163

Test Plan
--

New inline tests. This also required adding a `PythonVersion` field to
the `TestContext` that could be taken from the inline `ParseOptions` and
making the version field on the options accessible.
2025-03-29 12:07:20 -04:00
Brent Westbrook
d70a3e6753
[syntax-errors] Multiple assignments in case pattern (#16957)
Summary
--

This PR detects multiple assignments to the same name in `case` patterns
by recursively visiting each pattern.

Test Plan
--

New inline tests.
2025-03-26 13:02:42 -04:00
Brent Westbrook
5697d21fca
[syntax-errors] Irrefutable case pattern before final case (#16905)
Summary
--

Detects irrefutable `match` cases before the final case using a modified
version
of the existing `Pattern::is_irrefutable` method from the AST crate. The
modified method helps to retrieve a more precise diagnostic range to
match what
Python 3.13 shows in the REPL.

Test Plan
--

New inline tests, as well as some updates to existing tests that had
irrefutable
patterns before the last block.
2025-03-26 12:27:16 -04:00
Brent Westbrook
2711e08eb8
[syntax-errors] Fix false positive for parenthesized tuple index (#16948)
Summary
--

Fixes #16943 by checking if the tuple is not parenthesized before
emitting an error.

Test Plan
--

New inline test based on the initial report
2025-03-24 10:34:38 -04:00
Brent Westbrook
e4f5fe8cf7
[syntax-errors] Duplicate type parameter names (#16858)
Summary
--

Detects duplicate type parameter names in function definitions, class
definitions, and type alias statements.

I also boxed the `type_params` field on `StmtTypeAlias` to make it
easier to
`match` with functions and classes. (That's the reason for the red-knot
code
owner review requests, sorry!)

Test Plan
--

New `ruff_python_syntax_errors` unit tests.

Fixes #11119.
2025-03-21 15:06:22 -04:00
Brent Westbrook
2baaedda6c
[syntax-errors] Start detecting compile-time syntax errors (#16106)
## Summary

This PR implements the "greeter" approach for checking the AST for
syntax errors emitted by the CPython compiler. It introduces two main
infrastructural changes to support all of the compile-time errors:
1. Adds a new `semantic_errors` module to the parser crate with public
`SemanticSyntaxChecker` and `SemanticSyntaxError` types
2. Embeds a `SemanticSyntaxChecker` in the `ruff_linter::Checker` for
checking these errors in ruff

As a proof of concept, it also implements detection of two syntax
errors:
1. A reimplementation of
[`late-future-import`](https://docs.astral.sh/ruff/rules/late-future-import/)
(`F404`)
2. Detection of rebound comprehension iteration variables
(https://github.com/astral-sh/ruff/issues/14395)

## Test plan
Existing F404 tests, new inline tests in the `ruff_python_parser` crate,
and a linter CLI test showing an example of the `Message` output.

I also tested in VS Code, where `preview = false` and turning off syntax
errors both disable the new errors:


![image](https://github.com/user-attachments/assets/cf453d95-04f7-484b-8440-cb812f29d45e)

And on the playground, where `preview = false` also disables the errors:


![image](https://github.com/user-attachments/assets/a97570c4-1efa-439f-9d99-a54487dd6064)


Fixes #14395

---------

Co-authored-by: Micha Reiser <micha@reiser.io>
2025-03-21 14:45:25 -04:00
Junhson Jean-Baptiste
2a4d835132
Use the common OperatorPrecedence for the parser (#16747)
## Summary

This change continues to resolve #16071 (and continues the work started
in #16162). Specifically, this PR changes the code in the parser so that
it uses the `OperatorPrecedence` struct from `ruff_python_ast` instead
of its own version. This is part of an effort to get rid of the
redundant definitions of `OperatorPrecedence` throughout the codebase.

Note that this PR only makes this change for `ruff_python_parser` -- we
still want to make a similar change for the formatter (namely the
`OperatorPrecedence` defined in the expression part of the formatter,
the pattern one is different). I separated the work to keep the PRs
small and easily reviewable.

## Test Plan

Because this is an internal change, I didn't add any additional tests.
Existing tests do pass.
2025-03-21 09:40:37 +05:30
Brent Westbrook
42cbce538b
[syntax-errors] Fix star annotation before Python 3.11 (#16878)
Summary
--

Fixes #16874. I previously emitted a syntax error when starred
annotations were _allowed_ rather than when they were actually used.
This caused false positives for any starred parameter name because these
are allowed to have starred annotations but not required to. The fix is
to check if the annotation is actually starred after parsing it.

Test Plan
--

New inline parser tests derived from the initial report and more
examples from the comments, although I think the first case should cover
them all.
2025-03-20 17:44:52 -04:00
Brent Westbrook
dcf31c9348
[syntax-errors] PEP 701 f-strings before Python 3.12 (#16543)
## Summary

This PR detects the use of PEP 701 f-strings before 3.12. This one
sounded difficult and ended up being pretty easy, so I think there's a
good chance I've over-simplified things. However, from experimenting in
the Python REPL and checking with [pyright], I think this is correct.
pyright actually doesn't even flag the comment case, but Python does.

I also checked pyright's implementation for
[quotes](98dc4469cc/packages/pyright-internal/src/analyzer/checker.ts (L1379-L1398))
and
[escapes](98dc4469cc/packages/pyright-internal/src/analyzer/checker.ts (L1365-L1377))
and think I've approximated how they do it.

Python's error messages also point to the simple approach of these
characters simply not being allowed:

```pycon
Python 3.11.11 (main, Feb 12 2025, 14:51:05) [Clang 19.1.6 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> f'''multiline {
... expression # comment
... }'''
  File "<stdin>", line 3
    }'''
        ^
SyntaxError: f-string expression part cannot include '#'
>>> f'''{not a line \
... continuation}'''
  File "<stdin>", line 2
    continuation}'''
                    ^
SyntaxError: f-string expression part cannot include a backslash
>>> f'hello {'world'}'
  File "<stdin>", line 1
    f'hello {'world'}'
              ^^^^^
SyntaxError: f-string: expecting '}'
```

And since escapes aren't allowed, I don't think there are any tricky
cases where nested quotes or comments can sneak in.

It's also slightly annoying that the error is repeated for every nested
quote character, but that also mirrors pyright, although they highlight
the whole nested string, which is a little nicer. However, their check
is in the analysis phase, so I don't think we have such easy access to
the quoted range, at least without adding another mini visitor.

## Test Plan

New inline tests

[pyright]:
https://pyright-play.net/?pythonVersion=3.11&strict=true&code=EYQw5gBAvBAmCWBjALgCgO4gHaygRgEoAoEaCAIgBpyiiBiCLAUwGdknYIBHAVwHt2LIgDMA5AFlwSCJhwAuCAG8IoMAG1Rs2KIC6EAL6iIxosbPmLlq5foRWiEAAcmERAAsQAJxAomnltY2wuSKogA6WKIAdABWfPBYqCAE%2BuSBVqbpWVm2iHwAtvlMWMgB2ekiolUAgq4FjgA2TAAeEMieSADWCsoV5qoaqrrGDJ5MiDz%2B8ABuLqosAIREhlXlaybrmyYMXsDw7V4AnoysyAmQ5SIhwYo3d9cheADUeKlv5O%2BpQA
2025-03-18 11:12:15 -04:00
Brent Westbrook
75a562d313
[syntax-errors] Parenthesized context managers before Python 3.9 (#16523)
Summary
--

I thought this was very complicated based on the comment here:
https://github.com/astral-sh/ruff/pull/16106#issuecomment-2653505671 and
on some of the discussion in the CPython issue here:
https://github.com/python/cpython/issues/56991. However, after a little
bit of experimentation, I think it boils down to this example:

```python
with (x as y): ...
```

The issue is parentheses around a `with` item with an `optional_var`, as
we (and
[Python](https://docs.python.org/3/library/ast.html#ast.withitem)) call
the trailing variable name (`y` in this case). It's not actually about
line breaks after all, except that line breaks are allowed in
parenthesized expressions, which explains the validity of cases like


```pycon
>>> with (
...     x,
...     y
... ) as foo:
...     pass
... 
```

even on Python 3.8.

I followed [pyright]'s example again here on the diagnostic range (just
the opening paren) and the wording of the error.


Test Plan
--
Inline tests

[pyright]:
https://pyright-play.net/?pythonVersion=3.7&strict=true&code=FAdwlgLgFgBAFAewA4FMB2cBEAzBCB0EAHhJgJQwCGAzjLgmQFwz6tA
2025-03-17 08:54:55 -04:00
Alex Waygood
38bfda94ce
[syntax-errors] Improve error message and range for pre-PEP-614 decorator syntax errors (#16581)
## Summary

A small followup to https://github.com/astral-sh/ruff/pull/16386. We now
tell the user exactly what it was about their decorator that constituted
invalid syntax on Python <3.9, and the range now highlights the specific
sub-expression that is invalid rather than highlighting the whole
decorator

## Test Plan

Inline snapshots are updated, and new ones are added.
2025-03-17 11:17:27 +00:00
Brent Westbrook
3a32e56445
[syntax-errors] Unparenthesized assignment expressions in sets and indexes (#16404)
## Summary
This PR detects unparenthesized assignment expressions used in set
literals and comprehensions and in sequence indexes. The link to the
release notes in https://github.com/astral-sh/ruff/issues/6591 just has
this entry:
> * Assignment expressions can now be used unparenthesized within set
literals and set comprehensions, as well as in sequence indexes (but not
slices).

with no other information, so hopefully the test cases I came up with
cover all of the changes. I also tested these out in the Python REPL and
they actually worked in Python 3.9 too. I'm guessing this may be another
case that was "formally made part of the language spec in Python 3.10,
but usable -- and commonly used -- in Python >=3.9" as @AlexWaygood
added to the body of #6591 for context managers. So we may want to
change the version cutoff, but I've gone along with the release notes
for now.

## Test Plan

New inline parser tests and linter CLI tests.
2025-03-14 15:06:42 -04:00
Brent Westbrook
6311412373
[syntax-errors] Star annotations before Python 3.11 (#16545)
Summary
--

This is closely related to (and stacked on)
https://github.com/astral-sh/ruff/pull/16544 and detects star
annotations in function definitions.

I initially called the variant `StarExpressionInAnnotation` to mirror
`StarExpressionInIndex`, but I realized it's not really a "star
expression" in this position and renamed it. `StarAnnotation` seems in
line with the PEP.

Test Plan
--

Two new inline tests. It looked like there was pretty good existing
coverage of this syntax, so I just added simple examples to test the
version cutoff.
2025-03-14 15:20:44 +00:00
Brent Westbrook
4f2851982d
[syntax-errors] Star expression in index before Python 3.11 (#16544)
Summary
--

This PR detects tuple unpacking expressions in index/subscript
expressions before Python 3.11.

Test Plan
--

New inline tests
2025-03-14 14:51:34 +00:00
Brent Westbrook
2382fe1f25
[syntax-errors] Tuple unpacking in for statement iterator clause before Python 3.9 (#16558)
Summary
--

This PR reuses a slightly modified version of the
`check_tuple_unpacking` method added for detecting unpacking in `return`
and `yield` statements to detect the same issue in the iterator clause
of `for` loops.

I ran into the same issue with a bare `for x in *rest: ...` example
(invalid even on Python 3.13) and added it as a comment on
https://github.com/astral-sh/ruff/issues/16520.

I considered just making this an additional `StarTupleKind` variant as
well, but this change was in a different version of Python, so I kept it
separate.

Test Plan
--

New inline tests.
2025-03-13 15:55:17 -04:00
Brent Westbrook
b3c884f4f3
[syntax-errors] Parenthesized keyword argument names after Python 3.8 (#16482)
Summary
--

Unlike the other syntax errors detected so far, parenthesized keyword
arguments are only allowed *before* 3.8. It sounds like they were only
accidentally allowed before that [^1].

As an aside, you get a pretty confusing error from Python for this, so
it's nice that we can catch it:

```pycon
>>> def f(**kwargs): ...
... f((a)=1)
...
  File "<python-input-0>", line 2
    f((a)=1)
       ^^^
SyntaxError: expression cannot contain assignment, perhaps you meant "=="?
>>>
```
Test Plan
--
Inline tests.

[^1]: https://github.com/python/cpython/issues/78822
2025-03-06 12:18:13 -05:00
Brent Westbrook
6c14225c66
[syntax-errors] Tuple unpacking in return and yield before Python 3.8 (#16485)
Summary
--

Checks for tuple unpacking in `return` and `yield` statements before
Python 3.8, as described [here].

Test Plan
--
Inline tests.

[here]: https://github.com/python/cpython/issues/76298
2025-03-06 11:57:20 -05:00
Brent Westbrook
318f503714
[syntax-errors] Named expressions in decorators before Python 3.9 (#16386)
Summary
--

This PR detects the relaxed grammar for decorators proposed in [PEP
614](https://peps.python.org/pep-0614/) on Python 3.8 and lower.

The 3.8 grammar for decorators is
[here](https://docs.python.org/3.8/reference/compound_stmts.html#grammar-token-decorators):

```
decorators                ::=  decorator+
decorator                 ::=  "@" dotted_name ["(" [argument_list [","]] ")"] NEWLINE
dotted_name               ::=  identifier ("." identifier)*
```

in contrast to the current grammar
[here](https://docs.python.org/3/reference/compound_stmts.html#grammar-token-python-grammar-decorators)

```
decorators                ::= decorator+
decorator                 ::= "@" assignment_expression NEWLINE
assignment_expression ::= [identifier ":="] expression
```

Test Plan
--

New inline parser tests.
2025-03-05 17:08:18 +00:00
Brent Westbrook
d0623888b3
[syntax-errors] Positional-only parameters before Python 3.8 (#16481)
Summary
--

Detect positional-only parameters before Python 3.8, as marked by the
`/` separator in a parameter list.

Test Plan
--
Inline tests.
2025-03-05 13:46:43 +00:00
Brent Westbrook
81bcdcebd3
[syntax-errors] Type parameter lists before Python 3.12 (#16479)
Summary
--

Another simple one, just detect type parameter lists in functions
and classes. Like pyright, we don't emit a second diagnostic for
`type` alias statements, which were also introduced in 3.12.

Test Plan
--
Inline tests.
2025-03-05 13:19:09 +00:00
Brent Westbrook
32c66ec4b7
[syntax-errors] type alias statements before Python 3.12 (#16478)
Summary
--
Another simple one, just detect standalone `type` statements. I limited
the diagnostic to `type` itself like [pyright]. That probably makes the
most sense for more complicated examples.

Test Plan
--
Inline tests.

[pyright]:
https://pyright-play.net/?pythonVersion=3.8&strict=true&code=C4TwDgpgBAHlC8UCWA7YQ
2025-03-04 17:20:10 +00:00
Brent Westbrook
e7b93f93ef
[syntax-errors] Type parameter defaults before Python 3.13 (#16447)
Summary
--

Detects the presence of a [PEP 696] type parameter default before Python
3.13.

Test Plan
--

New inline parser tests for type aliases, generic functions and generic
classes.

[PEP 696]: https://peps.python.org/pep-0696/#grammar-changes
2025-03-04 16:53:38 +00:00
Brent Westbrook
c8a06a9be8
[syntax-errors] Limit except* range to * (#16473)
Summary
--
This is a follow-up to #16446 to fix the diagnostic range to point to
the `*` like `pyright` does
(https://github.com/astral-sh/ruff/pull/16446#discussion_r1976900643).

Storing the range in the `ExceptClauseKind::Star` variant feels slightly
awkward, but we don't store the star itself anywhere on the
`ExceptHandler`. And we can't just take `ExceptHandler.start() +
"except".text_len()` because this code appears to be valid:

```python
try: ...
except    *    Error: ...
```

Test Plan
--
Existing tests.
2025-03-04 16:50:09 +00:00
Brent Westbrook
37fbe58b13
Document LinterResult::has_syntax_error and add Parsed::has_no_syntax_errors (#16443)
Summary
--

This is a follow up addressing the comments on #16425. As @dhruvmanila
pointed out, the naming is a bit tricky. I went with `has_no_errors` to
try to differentiate it from `is_valid`. It actually ends up negated in
most uses, so it would be more convenient to have `has_any_errors` or
`has_errors`, but I thought it would sound too much like the opposite of
`is_valid` in that case. I'm definitely open to suggestions here.

Test Plan
--

Existing tests.
2025-03-04 08:35:38 -05:00
Brent Westbrook
e924ecbdac
[syntax-errors] except* before Python 3.11 (#16446)
Summary
--

One of the simpler ones, just detect the use of `except*` before 3.11.

Test Plan
--

New inline tests.
2025-03-02 18:20:18 +00:00
Brent Westbrook
4431978262
[syntax-errors] Assignment expressions before Python 3.8 (#16383)
## Summary
This PR is the first in a series derived from
https://github.com/astral-sh/ruff/pull/16308, each of which add support
for detecting one version-related syntax error from
https://github.com/astral-sh/ruff/issues/6591. This one should be
the largest because it also includes the addition of the 
`Parser::add_unsupported_syntax_error` method

Otherwise I think the general structure will be the same for each syntax
error:
* Detecting the error in the parser
* Inline parser tests for the new error
* New ruff CLI tests for the new error

## Test Plan
As noted above, there are new inline parser tests, as well as new ruff
CLI
tests. Once https://github.com/astral-sh/ruff/pull/16379 is resolved,
there should also be new mdtests for red-knot,
but this PR does not currently include those.
2025-02-28 17:13:46 -05:00
Brent Westbrook
764aa0e6a1
Allow passing ParseOptions to inline tests (#16357)
## Summary

This PR adds support for a pragma-style header for inline parser tests
containing JSON-serialized `ParseOptions`. For example,

```python
# parse_options: { "target-version": "3.9" }
match 2:
    case 1:
        pass
```

The line must start with `# parse_options: ` and then the rest of the
(trimmed) line is deserialized into `ParseOptions` used for parsing the
the test.

## Test Plan

Existing inline tests, plus two new inline tests for
`match-before-py310`.

---------

Co-authored-by: Alex Waygood <alex.waygood@gmail.com>
2025-02-27 10:23:15 -05:00
Brent Westbrook
97d0659ce3
Pass ParserOptions to the parser (#16220)
## Summary

This is part of the preparation for detecting syntax errors in the
parser from https://github.com/astral-sh/ruff/pull/16090/. As suggested
in [this
comment](https://github.com/astral-sh/ruff/pull/16090/#discussion_r1953084509),
I started working on a `ParseOptions` struct that could be stored in the
parser. For this initial refactor, I only made it hold the existing
`Mode` option, but for syntax errors, we will also need it to have a
`PythonVersion`. For that use case, I'm picturing something like a
`ParseOptions::with_python_version` method, so you can extend the
current calls to something like

```rust
ParseOptions::from(mode).with_python_version(settings.target_version)
```

But I thought it was worth adding `ParseOptions` alone without changing
any other behavior first.

Most of the diff is just updating call sites taking `Mode` to take
`ParseOptions::from(Mode)` or those taking `PySourceType`s to take
`ParseOptions::from(PySourceType)`. The interesting changes are in the
new `parser/options.rs` file and smaller parts of `parser/mod.rs` and
`ruff_python_parser/src/lib.rs`.

## Test Plan

Existing tests, this should not change any behavior.
2025-02-19 10:50:50 -05:00
Andrew Gallant
17f01a4355 test: add more missing carets
This update includes some missing `^` in the diagnostic annotations.

This update also includes some shifting of "syntax error" annotations to
the end of the preceding line. I believe this is technically a
regression, but fixing them has proven quite difficult. I *think* the
best way to do that might be to tweak the spans generated by the Python
parser errors, but I didn't want to dig into that. (Another approach
would be to change the `annotate-snippets` rendering, but when I tried
that and managed to fix these regressions, I ended up causing a bunch of
other regressions.)

Ref 77d454525e (r1915458616)
2025-01-15 13:37:52 -05:00
Andrew Gallant
84ba4ecaf5 ruff_annotate_snippets: support overriding the "cut indicator"
We do this because `...` is valid Python, which makes it pretty likely
that some line trimming will lead to ambiguous output. So we add support
for overriding the cut indicator. This also requires changing some of
the alignment math, which was previously tightly coupled to `...`.

For Ruff, we go with `…` (`U+2026 HORIZONTAL ELLIPSIS`) for our cut
indicator.

For more details, see the patch sent to upstream:
https://github.com/rust-lang/annotate-snippets-rs/pull/172
2025-01-15 13:37:52 -05:00
Andrew Gallant
5caef89af3 test: update snapshots with improper end-of-line placement
This looks like a bug fix that occurs when the annotation is a
zero-width span immediately following a line terminator. Previously, the
caret seems to be rendered on the next line, but it should be rendered
at the end of the line the span corresponds to.

I admit that this one is kinda weird. I would somewhat expect that our
spans here are actually incorrect, and that to obtain this sort of
rendering, we should identify a span just immediately _before_ the line
terminator and not after it. But I don't want to dive into that rabbit
hole for now (and given how `annotate-snippets` now renders these
spans, perhaps there is more to it than I see), and this does seem like
a clear improvement given the spans we feed to `annotate-snippets`.
2025-01-15 13:37:52 -05:00
Andrew Gallant
f49cfb6c28 test: update snapshots with missing ^
The previous rendering just seems wrong in that a `^` is omitted. The
new version of `annotate-snippets` seems to get this right. I checked a
pseudo random sample of these, and it seems to only happen when the
position pointed at a line terminator.
2025-01-15 13:37:52 -05:00
Andrew Gallant
3fa4479c85 test: update snapshots with missing annotations
These updates center around the addition of annotations in the
diagnostic rendering. Previously, the annotation was just not rendered
at all. With the `annotate-snippets` upgrade, it is now rendered. I
examined a pseudo random sample of these, and they all look correct.

As will be true in future batches, some of these snapshots also have
changes to whitespace in them as well.
2025-01-15 13:37:52 -05:00
Andrew Gallant
0de8216a25 test: update snapshots with just whitespace changes
These snapshot changes should *all* only be a result of changes to
trailing whitespace in the output. I checked a psuedo random sample of
these, and the whitespace found in the previous snapshots seems to be an
artifact of the rendering and _not_ of the source data. So this seems
like a strict bug fix to me.

There are other snapshots with whitespace changes, but they also have
other changes that we split out into separate commits. Basically, we're
going to do approximately one commit per category of change.

This represents, by far, the biggest chunk of changes to snapshots as a
result of the `annotate-snippets` upgrade.
2025-01-15 13:37:52 -05:00
Andrew Gallant
84179aaa96 ruff_linter,ruff_python_parser: migrate to updated annotate-snippets
This is pretty much just moving to the new API and taking care to use
byte offsets. This is *almost* enough. The next commit will fix a bug
involving the handling of unprintable characters as a result of
switching to byte offsets.
2025-01-15 13:37:52 -05:00
Dylan
c1eaf6ff72
Modify parsing of raise with cause when exception is absent (#15049)
When confronted with `raise from exc` the parser will now create a
`StmtRaise` that has `None` for the exception and `exc` for the cause.

Before, the parser created a `StmtRaise` with `from` for the exception,
no cause, and a spurious expression `exc` afterwards.
2024-12-19 13:36:32 +00:00
Dylan
a3bb0cd5ec
Raise syntax error for mixing except and except* (#14895)
This PR adds a syntax error if the parser encounters a `TryStmt` that
has except clauses both with and without a star.

The displayed error points to each except clause that contradicts the
original except clause kind. So, for example,

```python
try:
    ....
except:     #<-- we assume this is the desired except kind
    ....
except*:    #<---  error will point here
    ....
except*:    #<--- and here
    ....
```

Closes #14860
2024-12-10 17:50:55 -06:00
Micha Reiser
c847cad389
Update insta snapshots (#14366) 2024-11-15 19:31:15 +01:00
Dhruv Manilawala
978909fcf4
Raise syntax error for unparenthesized generator expr in multi-argument call (#12445)
## Summary

This PR fixes a bug to raise a syntax error when an unparenthesized
generator expression is used as an argument to a call when there are
more than one argument.

For reference, the grammar is:
```
primary:
    | ...
    | primary genexp 
    | primary '(' [arguments] ')' 
    | ...

genexp:
    | '(' ( assignment_expression | expression !':=') for_if_clauses ')' 
```

The `genexp` requires the parenthesis as mentioned in the grammar. So,
the grammar for a call expression is either a name followed by a
generator expression or a name followed by a list of argument. In the
former case, the parenthesis are excluded because the generator
expression provides them while in the later case, the parenthesis are
explicitly provided for a list of arguments which means that the
generator expression requires it's own parenthesis.

This was discovered in https://github.com/astral-sh/ruff/issues/12420.

## Test Plan

Add test cases for valid and invalid syntax.

Make sure that the parser from CPython also raises this at the parsing
step:
```console
$ python3.13 -m ast parser/_.py
  File "parser/_.py", line 1
    total(1, 2, x for x in range(5), 6)
                ^^^^^^^^^^^^^^^^^^^
SyntaxError: Generator expression must be parenthesized

$ python3.13 -m ast parser/_.py
  File "parser/_.py", line 1
    sum(x for x in range(10), 10)
        ^^^^^^^^^^^^^^^^^^^^
SyntaxError: Generator expression must be parenthesized
```
2024-07-22 14:44:20 +05:30
Micha Reiser
5109b50bb3
Use CompactString for Identifier (#12101) 2024-07-01 10:06:02 +02:00
Dhruv Manilawala
434ce307a7
Revert "Use correct range to highlight line continuation error" (#12089)
This PR reverts https://github.com/astral-sh/ruff/pull/12016 with a
small change where the error location points to the continuation
character only. Earlier, it would also highlight the whitespace that
came before it.

The motivation for this change is to avoid panic in
https://github.com/astral-sh/ruff/pull/11950. For example:

```py
\)
```

Playground: https://play.ruff.rs/87711071-1b54-45a3-b45a-81a336a1ea61

The range of `Unknown` token and `Rpar` is the same. Once #11950 is
enabled, the indexer would panic. It won't panic in the stable version
because we stop at the first `Unknown` token.
2024-06-28 18:10:00 +05:30
Dhruv Manilawala
a4688aebe9
Use TokenSource to find new location for re-lexing (#12060)
## Summary

This PR splits the re-lexing logic into two parts:
1. `TokenSource`: The token source will be responsible to find the
position the lexer needs to be moved to
2. `Lexer`: The lexer will be responsible to reduce the nesting level
and move itself to the new position if recovered from a parenthesized
context

This split makes it easy to find the new lexer position without needing
to implement the backwards lexing logic again which would need to handle
cases involving:
* Different kinds of newlines
* Line continuation character(s)
* Comments
* Whitespaces

### F-strings

This change did reveal one thing about re-lexing f-strings. Consider the
following example:
```py
f'{'
#  ^
f'foo'
```

Here, the quote as highlighted by the caret (`^`) is the start of a
string inside an f-string expression. This is unterminated string which
means the token emitted is actually `Unknown`. The parser tries to
recover from it but there's no newline token in the vector so the new
logic doesn't recover from it. The previous logic does recover because
it's looking at the raw characters instead.

The parser would be at `FStringStart` (the one for the second line) when
it calls into the re-lexing logic to recover from an unterminated
f-string on the first line. So, moving backwards the first character
encountered is a newline character but the first token encountered is an
`Unknown` token.

This is improved with #12067 

fixes: #12046 
fixes: #12036

## Test Plan

Update the snapshot and validate the changes.
2024-06-27 17:12:39 +05:30
Dhruv Manilawala
e137c824c3
Avoid consuming newline for unterminated string (#12067)
## Summary

This PR fixes the lexer logic to **not** consume the newline character
for an unterminated string literal.

Currently, the lexer would consume it to be part of the string itself
but that would be bad for recovery because then the lexer wouldn't emit
the newline token ever. This PR fixes that to avoid consuming the
newline character in that case.

This was discovered during https://github.com/astral-sh/ruff/pull/12060.

## Test Plan

Update the snapshots and validate them.
2024-06-27 17:02:48 +05:30
Dhruv Manilawala
47c9ed07f2
Consider 2-character EOL before line continuation (#12035)
## Summary

This PR fixes a bug introduced in
https://github.com/astral-sh/ruff/pull/12008 which didn't consider the
two character newline after the line continuation character.

For example, consider the following code highlighted with whitespaces:
```py
call(foo # comment \\r\n
\r\n
def bar():\r\n
....pass\r\n
```
The lexer is at `def` when it's running the re-lexing logic and trying
to move back to a newline character. It encounters `\n` and it's being
escaped (incorrect) but `\r` is being escaped, so it moves the lexer to
`\n` character. This creates an overlap in token ranges which causes the
panic.

```
Name 0..4
Lpar 4..5
Name 5..8
Comment 9..20
NonLogicalNewline 20..22 <-- overlap between
Newline 21..22           <-- these two tokens
NonLogicalNewline 22..23
Def 23..26
...
```

fixes: #12028 

## Test Plan

Add a test case with line continuation and windows style newline
character.
2024-06-26 14:00:48 +05:30
Dhruv Manilawala
7cb2619ef5
Add syntax error for empty type parameter list (#12030)
## Summary

(I'm pretty sure I added this in the parser re-write but must've got
lost in the rebase?)

This PR raises a syntax error if the type parameter list is empty.

As per the grammar, there should be at least one type parameter:
```
type_params: 
    | invalid_type_params
    | '[' type_param_seq ']' 

type_param_seq: ','.type_param+ [','] 
```

Verified via the builtin `ast` module as well:
```console    
$ python3.13 -m ast parser/_.py
Traceback (most recent call last):
  [..]
  File "parser/_.py", line 1
    def foo[]():
            ^
SyntaxError: Type parameter list cannot be empty
```

## Test Plan

Add inline test cases and update the snapshots.
2024-06-26 08:10:35 +05:30
Dhruv Manilawala
7109214b57
Update parser tests to validate token ranges (#12019)
## Summary

This PR updates the parser test infrastructure to validate the token
ranges.

From the code documentation:
```
/// Verifies that:
/// * the ranges are strictly increasing when loop the tokens in insertion order
/// * all ranges are within the length of the source code
```

Follow-up from #12016 and #12017
resolves: #11938

## Test Plan

Make sure that there are no failures.
2024-06-25 08:14:28 +00:00
Dhruv Manilawala
d930e97212
Do not include newline for unterminated string range (#12017)
## Summary

This PR updates the unterminated string error range to not include the
final newline character.

This is a follow-up to #12016 and required for #12019

This is not done for when the unterminated string goes till the end of
file (not a newline character). The unterminated f-string range is
correct.

### Why is this required for #12019 ?

Because otherwise the token ranges will overlap. For example:
```py
f"{"
f"{foo!r"
```

Here, the re-lexing logic recovers from an unterminated f-string and
thus emitting a `Newline` token for the one at the end of the first
line. But, currently the `Unknown` and the `Newline` token would overlap
because the `Unknown` token (unterminated string literal) range would
include the newline character.

## Test Plan

Update and validate the snapshot.
2024-06-25 08:10:07 +00:00
Dhruv Manilawala
9c1b6ec411
Use correct range to highlight line continuation error (#12016)
## Summary

This PR fixes the range highlighted for the line continuation error.

Previously, it would highlight an incorrect range:
```
1 | call(a, b, \\\
  |           ^^ Syntax Error: unexpected character after line continuation character
2 | 
3 | def bar():
  |
```

And now:
```
  |
1 | call(a, b, \\\
  |             ^ Syntax Error: unexpected character after line continuation character
2 | 
3 | def bar():
  |
```

This is implemented by avoiding to update the token range for the
`Unknown` token which is emitted when there's a lexical error. Instead,
the `push_error` helper method will be responsible to update the range
to the error location.

This actually becomes a requirement which can be seen in follow-up PRs.

## Test Plan

Update and validate the snapshot.
2024-06-25 13:35:24 +05:30
Dhruv Manilawala
68a8978454
Consider line continuation character for re-lexing (#12008)
## Summary

This PR fixes a bug where the re-lexing logic didn't consider the line
continuation character being present before the newline character. This
meant that the lexer was being moved back to the newline character which
is actually ignored via `\`.

Considering the following code:
```py
f'middle {'string':\
        'format spec'}

```

The old token stream is:
```
...
Colon 18..19
FStringMiddle 19..29 (flags = F_STRING)
Newline 20..21
Indent 21..29
String 29..42
Rbrace 42..43
...
```

Notice how the ranges are overlapping between the `FStringMiddle` token
and the tokens emitted after moving the lexer backwards.

After this fix, the new token stream which is without moving the lexer
backwards in this scenario:
```
FStringStart 0..2 (flags = F_STRING)
FStringMiddle 2..9 (flags = F_STRING)
Lbrace 9..10
String 10..18
Colon 18..19
FStringMiddle 19..29 (flags = F_STRING)
FStringEnd 29..30 (flags = F_STRING)
Name 30..36
Name 37..41
Unknown 41..44
Newline 44..45
```

fixes: #12004 

## Test Plan

Add test cases and update the snapshots.
2024-06-25 02:13:54 +00:00