Replace LALRPOP parser with hand-written parser (#10036)

(Supersedes #9152, authored by @LaBatata101)

## Summary

This PR replaces the current parser generated from LALRPOP to a
hand-written recursive descent parser.

It also updates the grammar for [PEP
646](https://peps.python.org/pep-0646/) so that the parser outputs the
correct AST. For example, in `data[*x]`, the index expression is now a
tuple with a single starred expression instead of just a starred
expression.

Beyond the performance improvements, the parser is also error resilient
and can provide better error messages. The behavior as seen by any
downstream tools isn't changed. That is, the linter and formatter can
still assume that the parser will _stop_ at the first syntax error. This
will be updated in the following months.

For more details about the change here, refer to the PR corresponding to
the individual commits and the release blog post.

## Test Plan

Write _lots_ and _lots_ of tests for both valid and invalid syntax and
verify the output.

## Acknowledgements

- @MichaReiser for reviewing 100+ parser PRs and continuously providing
guidance throughout the project
- @LaBatata101 for initiating the transition to a hand-written parser in
#9152
- @addisoncrump for implementing the fuzzer which helped
[catch](https://github.com/astral-sh/ruff/pull/10903)
[a](https://github.com/astral-sh/ruff/pull/10910)
[lot](https://github.com/astral-sh/ruff/pull/10966)
[of](https://github.com/astral-sh/ruff/pull/10896)
[bugs](https://github.com/astral-sh/ruff/pull/10877)

---------

Co-authored-by: Victor Hugo Gomes <labatata101@linuxmail.org>
Co-authored-by: Micha Reiser <micha@reiser.io>
This commit is contained in:
Dhruv Manilawala 2024-04-18 17:57:39 +05:30 committed by GitHub
parent e09180b1df
commit 13ffb5bc19
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
852 changed files with 112948 additions and 103620 deletions

View file

@ -0,0 +1,8 @@
match subject:
# Parser shouldn't confuse this as being a
# class pattern
# v
case (x as y)(a, b):
# ^^^^^^
# as-pattern
pass

View file

@ -0,0 +1,8 @@
match subject:
# Parser shouldn't confuse this as being a
# complex literal pattern
# v
case (x as y) + 1j:
# ^^^^^^
# as-pattern
pass

View file

@ -0,0 +1,5 @@
match subject:
# This `as` pattern is unparenthesied so the parser never takes the path
# where it might be confused as a complex literal pattern.
case x as y + 1j:
pass

View file

@ -0,0 +1,5 @@
match subject:
# Not in the mapping start token set, so the list parsing bails
# v
case {(x as y): 1}:
pass

View file

@ -0,0 +1,5 @@
match subject:
# This `as` pattern is unparenthesized so the parser never takes the path
# where it might be confused as a mapping key pattern.
case {x as y: 1}:
pass

View file

@ -0,0 +1,17 @@
# Invalid keyword pattern in class argument
match subject:
case Foo(x as y = 1):
pass
case Foo(x | y = 1):
pass
case Foo([x, y] = 1):
pass
case Foo({False: 0} = 1):
pass
case Foo(1=1):
pass
case Foo(Bar()=1):
pass
# Positional pattern cannot follow keyword pattern
# case Foo(x, y=1, z):
# pass

View file

@ -0,0 +1,41 @@
match invalid_lhs_pattern:
case Foo() + 1j:
pass
case x + 2j:
pass
case _ + 3j:
pass
case (1 | 2) + 4j:
pass
case [1, 2] + 5j:
pass
case {True: 1} + 6j:
pass
case 1j + 2j:
pass
case -1j + 2j:
pass
case Foo(a as b) + 1j:
pass
match invalid_rhs_pattern:
case 1 + Foo():
pass
case 2 + x:
pass
case 3 + _:
pass
case 4 + (1 | 2):
pass
case 5 + [1, 2]:
pass
case 6 + {True: 1}:
pass
case 1 + 2:
pass
case 1 + Foo(a as b):
pass
match invalid_lhs_rhs_pattern:
case Foo() + Bar():
pass

View file

@ -0,0 +1,23 @@
# Starred expression is not allowed as a mapping pattern key
match subject:
case {*key}:
pass
case {*key: 1}:
pass
case {*key 1}:
pass
case {*key, None: 1}:
pass
# Pattern cannot follow a double star pattern
# Multiple double star patterns are not allowed
match subject:
case {**rest, None: 1}:
pass
case {**rest1, **rest2, None: 1}:
pass
case {**rest1, None: 1, **rest2}:
pass
match subject:
case {Foo(a as b): 1}: ...

View file

@ -0,0 +1,24 @@
# Star pattern is only allowed inside a sequence pattern
match subject:
case *_:
pass
case *_ as x:
pass
case *foo:
pass
case *foo | 1:
pass
case 1 | *foo:
pass
case Foo(*_):
pass
case Foo(x=*_):
pass
case {*_}:
pass
case {*_: 1}:
pass
case {None: *_}:
pass
case 1 + *_:
pass

View file

@ -0,0 +1,12 @@
# Unary addition isn't allowed but we parse it for better error recovery.
match subject:
case +1:
pass
case 1 | +2 | -3:
pass
case [1, +2, -3]:
pass
case Foo(x=+1, y=-2):
pass
case {True: +1, False: -2}:
pass