mirrors/ruff - Forgejo: Beyond coding. We Forge.

mirror of https://github.com/astral-sh/ruff.git synced 2025-07-14 00:25:08 +00:00

Author	SHA1	Message	Date
Dhruv Manilawala	c30735d4a7	Add `ExpressionContext` for expression parsing (#11055 ) ## Summary This PR adds a new `ExpressionContext` struct which is used in expression parsing. This solves the following problem: 1. Allowing starred expression with different precedence 2. Allowing yield expression in certain context 3. Remove ambiguity with `in` keyword when parsing a `for ... in` statement For context, (1) was solved by adding `parse_star_expression_list` and `parse_star_expression_or_higher` in #10623, (2) was solved by by adding `parse_yield_expression_or_else` in #10809, and (3) was fixed in #11009. All of the mentioned functions have been removed in favor of the context flags. As mentioned in #11009, an ideal solution would be to implement an expression context which is what this PR implements. This is passed around as function parameter and the call stack is used to automatically reset the context. ### Recovery How should the parser recover if the target expression is invalid when an expression can consume the `in` keyword? 1. Should the `in` keyword be part of the target expression? 2. Or, should the expression parsing stop as soon as `in` keyword is encountered, no matter the expression? For example: ```python for yield x in y: ... # Here, should this be parsed as for (yield x) in (y): ... # Or for (yield x in y): ... # where the `in iter` part is missing ``` Or, for binary expression parsing: ```python for x or y in z: ... # Should this be parsed as for (x or y) in z: ... # Or for (x or y in z): ... # where the `in iter` part is missing ``` This need not be solved now, but is very easy to change. For context this PR does the following: * For binary, comparison, and unary expressions, stop at `in` * For lambda, yield expressions, consume the `in` ## Test Plan 1. Add test cases for the `for ... in` statement and verify the snapshots 2. Make sure the existing test suite pass 3. Run the fuzzer for around 3000 generated source code 4. Run the updated logic on a dozen or so open source repositories (codename "parser-checkouts")	2024-04-23 04:19:05 +00:00
Dhruv Manilawala	d3cd61f804	Use empty range when there's "gap" in token source (#11032 ) ## Summary This fixes a bug where the parser would panic when there is a "gap" in the token source. What's a gap? The reason it's `<=` instead of just `==` is because there could be whitespaces between the two tokens. For example: ```python # last token end # \| current token (newline) start # v v def foo \n # ^ # assume there's trailing whitespace here ``` Or, there could tokens that are considered "trivia" and thus aren't emitted by the token source. These are comments and non-logical newlines. For example: ```python # last token end # v def foo # comment\n # ^ current token (newline) start ``` In either of the above cases, there's a "gap" between the end of the last token and start of the current token. ## Test Plan Add test cases and update the snapshots.	2024-04-19 11:36:26 +00:00
Dhruv Manilawala	9bb23b0a38	Expect indented case block instead of match stmt (#11033 ) ## Summary This PR adds a new `Clause::Case` and uses it to parse the body of a `case` block. Earlier, it was using `Match` which would give an incorrect error message like: ``` \| 1 \| match subject: 2 \| case 1: 3 \| case 2: ... \| ^^^^ Syntax Error: Expected an indented block after `match` statement \| ``` ## Test Plan Add test case and update the snapshot.	2024-04-19 16:46:15 +05:30
Dhruv Manilawala	b7066e64e7	Consider binary expr for parenthesized with items parsing (#11012 ) ## Summary This PR fixes the bug in with items parsing where it would fail to recognize that the parenthesized expression is part of a large binary expression. ## Test Plan Add test cases and verified the snapshots.	2024-04-18 21:39:30 +05:30
Dhruv Manilawala	6c4d779140	Consider `if` expression for parenthesized with items parsing (#11010 ) ## Summary This PR fixes the bug in parenthesized with items parsing where the `if` expression would result into a syntax error. The reason being that once we identify that the ambiguous left parenthesis belongs to the context expression, the parser converts the parsed with item into an equivalent expression. Then, the parser continuous to parse any postfix expressions. Now, attribute, subscript, and call are taken into account as they're grouped in `parse_postfix_expression` but `if` expression has it's own parsing function. Use `parse_if_expression` once all postfix expressions have been parsed. Ideally, I think that `if` could be included in postfix expression parsing as they can be chained as well (`x if True else y if True else z`). ## Test Plan Add test cases and verified the snapshots.	2024-04-18 14:30:15 +00:00
Dhruv Manilawala	8020d486f6	Reset `FOR_TARGET` context for all kinds of parentheses (#11009 ) ## Summary This PR fixes a bug in the new parser which involves the parser context w.r.t. for statement. This is specifically around the `in` keyword which can be present in the target expression and shouldn't be considered to be part of the `for` statement header. Ideally it should use a context which is passed between functions, thus using a call stack to set / unset a specific variant which will be done in a follow-up PR as it requires some amount of refactor. ## Test Plan Add test cases and update the snapshots.	2024-04-18 19:37:50 +05:30
Dhruv Manilawala	13ffb5bc19	Replace LALRPOP parser with hand-written parser (#10036 ) (Supersedes #9152, authored by @LaBatata101) ## Summary This PR replaces the current parser generated from LALRPOP to a hand-written recursive descent parser. It also updates the grammar for [PEP 646](https://peps.python.org/pep-0646/) so that the parser outputs the correct AST. For example, in `data[*x]`, the index expression is now a tuple with a single starred expression instead of just a starred expression. Beyond the performance improvements, the parser is also error resilient and can provide better error messages. The behavior as seen by any downstream tools isn't changed. That is, the linter and formatter can still assume that the parser will _stop_ at the first syntax error. This will be updated in the following months. For more details about the change here, refer to the PR corresponding to the individual commits and the release blog post. ## Test Plan Write _lots_ and _lots_ of tests for both valid and invalid syntax and verify the output. ## Acknowledgements - @MichaReiser for reviewing 100+ parser PRs and continuously providing guidance throughout the project - @LaBatata101 for initiating the transition to a hand-written parser in #9152 - @addisoncrump for implementing the fuzzer which helped [catch](https://github.com/astral-sh/ruff/pull/10903) [a](https://github.com/astral-sh/ruff/pull/10910) [lot](https://github.com/astral-sh/ruff/pull/10966) [of](https://github.com/astral-sh/ruff/pull/10896) [bugs](https://github.com/astral-sh/ruff/pull/10877) --------- Co-authored-by: Victor Hugo Gomes <labatata101@linuxmail.org> Co-authored-by: Micha Reiser <micha@reiser.io>	2024-04-18 17:57:39 +05:30

1 2

57 commits