mirrors/ruff - Forgejo: Beyond coding. We Forge.

mirror of https://github.com/astral-sh/ruff.git synced 2025-07-12 15:45:07 +00:00

Author	SHA1	Message	Date
omahs	882a1a702e	Fix typos (#17988 ) Fix typos --------- Co-authored-by: Brent Westbrook <36778786+ntBre@users.noreply.github.com> Co-authored-by: Brent Westbrook <brentrwestbrook@gmail.com>	2025-05-09 14:57:14 -04:00
Dhruv Manilawala	47c9ed07f2	Consider 2-character EOL before line continuation (#12035 ) ## Summary This PR fixes a bug introduced in https://github.com/astral-sh/ruff/pull/12008 which didn't consider the two character newline after the line continuation character. For example, consider the following code highlighted with whitespaces: ```py call(foo # comment \\r\n \r\n def bar():\r\n ....pass\r\n ``` The lexer is at `def` when it's running the re-lexing logic and trying to move back to a newline character. It encounters `\n` and it's being escaped (incorrect) but `\r` is being escaped, so it moves the lexer to `\n` character. This creates an overlap in token ranges which causes the panic. ``` Name 0..4 Lpar 4..5 Name 5..8 Comment 9..20 NonLogicalNewline 20..22 <-- overlap between Newline 21..22 <-- these two tokens NonLogicalNewline 22..23 Def 23..26 ... ``` fixes: #12028 ## Test Plan Add a test case with line continuation and windows style newline character.	2024-06-26 14:00:48 +05:30
Dhruv Manilawala	68a8978454	Consider line continuation character for re-lexing (#12008 ) ## Summary This PR fixes a bug where the re-lexing logic didn't consider the line continuation character being present before the newline character. This meant that the lexer was being moved back to the newline character which is actually ignored via `\`. Considering the following code: ```py f'middle {'string':\ 'format spec'} ``` The old token stream is: ``` ... Colon 18..19 FStringMiddle 19..29 (flags = F_STRING) Newline 20..21 Indent 21..29 String 29..42 Rbrace 42..43 ... ``` Notice how the ranges are overlapping between the `FStringMiddle` token and the tokens emitted after moving the lexer backwards. After this fix, the new token stream which is without moving the lexer backwards in this scenario: ``` FStringStart 0..2 (flags = F_STRING) FStringMiddle 2..9 (flags = F_STRING) Lbrace 9..10 String 10..18 Colon 18..19 FStringMiddle 19..29 (flags = F_STRING) FStringEnd 29..30 (flags = F_STRING) Name 30..36 Name 37..41 Unknown 41..44 Newline 44..45 ``` fixes: #12004 ## Test Plan Add test cases and update the snapshots.	2024-06-25 02:13:54 +00:00
Dhruv Manilawala	ed948eaefb	Avoid moving back the lexer for triple-quoted fstring (#11939 ) ## Summary This PR avoids moving back the lexer for a triple-quoted f-string during the re-lexing phase. The reason this is a problem is that for a triple-quoted f-string the newlines are part of the f-string itself, specifically they'll be part of the `FStringMiddle` token. So, if we moved the lexer back, there would be a `Newline` token whose range would be in between an `FStringMiddle` token. This creates a panic in downstream usage. fixes: #11937 ## Test Plan Add test cases and validate the snapshots.	2024-06-20 16:27:36 +05:30
Dhruv Manilawala	cdc7c71449	Avoid consuming trailing whitespace during re-lexing (#11933 ) ## Summary This PR updates the re-lexing logic to avoid consuming the trailing whitespace and move the lexer explicitly to the last newline character encountered while moving backwards. Consider the following code snippet as taken from the test case highlighted with whitespace (`.`) and newline (`\n`) characters: ```py # There are trailing whitespace before the newline character but those whitespaces are # part of the comment token f"""hello {x # comment....\n # ^ y = 1\n ``` The parser is at `y` when it's trying to recover from an unclosed `{`, so it calls into the re-lexing logic which tries to move the lexer back to the end of the previous line. But, as it consumed all whitespaces it moved the lexer to the location marked by `^` in the above code snippet. But, those whitespaces are part of the comment token. This means that the range for the two tokens were overlapping which introduced the panic. Note that this is only a bug when there's a comment with a trailing whitespace otherwise it's fine to move the lexer to the whitespace character. This is because the lexer would just skip the whitespace otherwise. Nevertheless, this PR updates the logic to move it explicitly to the newline character in all cases. fixes: #11929 ## Test Plan Add test cases and update the snapshot. Make sure that it doesn't panic on the code snippet in the linked issue.	2024-06-19 12:14:18 +05:30
Dhruv Manilawala	1e0642fac8	Use re-lexing for normal list parsing (#11871 ) ## Summary This PR is a follow-up on #11845 to add the re-lexing logic for normal list parsing. A normal list parsing is basically parsing elements without any separator in between i.e., there can only be trivia tokens in between the two elements. Currently, this is only being used for parsing assignment statement and f-string elements. Assignment statements cannot be in a parenthesized context, but f-string can have curly braces so this PR is specifically for them. I don't think this is an ideal recovery but the problem is that both lexer and parser could add an error for f-strings. If the lexer adds an error it'll emit an `Unknown` token instead while the parser adds the error directly. I think we'd need to move all f-string errors to be emitted by the parser instead. This way the parser can correctly inform the lexer that it's out of an f-string and then the lexer can pop the current f-string context out of the stack. ## Test Plan Add test cases, update the snapshots, and run the fuzzer.	2024-06-18 12:14:41 +05:30
Dhruv Manilawala	8499abfa7f	Implement re-lexing logic for better error recovery (#11845 ) ## Summary This PR implements the re-lexing logic in the parser. This logic is only applied when recovering from an error during list parsing. The logic is as follows: 1. During list parsing, if an unexpected token is encountered and it detects that an outer context can understand it and thus recover from it, it invokes the re-lexing logic in the lexer 2. This logic first checks if the lexer is in a parenthesized context and returns if it's not. Thus, the logic is a no-op if the lexer isn't in a parenthesized context 3. It then reduces the nesting level by 1. It shouldn't reset it to 0 because otherwise the recovery from nested list parsing will be incorrect 4. Then, it tries to find last newline character going backwards from the current position of the lexer. This avoids any whitespaces but if it encounters any character other than newline or whitespace, it aborts. 5. Now, if there's a newline character, then it needs to be re-lexed in a logical context which means that the lexer needs to emit it as a `Newline` token instead of `NonLogicalNewline`. 6. If the re-lexing gives a different token than the current one, the token source needs to update it's token collection to remove all the tokens which comes after the new current position. It turns out that the list parsing isn't that happy with the results so it requires some re-arranging such that the following two errors are raised correctly: 1. Expected comma 2. Recovery context error For (1), the following scenarios needs to be considered: * Missing comma between two elements * Half parsed element because the grammar doesn't allow it (for example, named expressions) For (2), the following scenarios needs to be considered: 1. If the parser is at a comma which means that there's a missing element otherwise the comma would've been consumed by the first `eat` call above. And, the parser doesn't take the re-lexing route on a comma token. 2. If it's the first element and the current token is not a comma which means that it's an invalid element. resolves: #11640 ## Test Plan - [x] Update existing test snapshots and validate them - [x] Add additional test cases specific to the re-lexing logic and validate the snapshots - [x] Run the fuzzer on 3000+ valid inputs - [x] Run the fuzzer on invalid inputs - [x] Run the parser on various open source projects - [x] Make sure the ecosystem changes are none	2024-06-17 06:47:00 +00:00
Dimitri Papadopoulos Orfanos	3b0584449d	Fix a few typos found by codespell (#11404 ) ## Summary Just fix typos. ## Test Plan CI jobs. --------- Co-authored-by: Dhruv Manilawala <dhruvmanila@gmail.com>	2024-05-13 13:22:35 +00:00
Dhruv Manilawala	13ffb5bc19	Replace LALRPOP parser with hand-written parser (#10036 ) (Supersedes #9152, authored by @LaBatata101) ## Summary This PR replaces the current parser generated from LALRPOP to a hand-written recursive descent parser. It also updates the grammar for [PEP 646](https://peps.python.org/pep-0646/) so that the parser outputs the correct AST. For example, in `data[*x]`, the index expression is now a tuple with a single starred expression instead of just a starred expression. Beyond the performance improvements, the parser is also error resilient and can provide better error messages. The behavior as seen by any downstream tools isn't changed. That is, the linter and formatter can still assume that the parser will _stop_ at the first syntax error. This will be updated in the following months. For more details about the change here, refer to the PR corresponding to the individual commits and the release blog post. ## Test Plan Write _lots_ and _lots_ of tests for both valid and invalid syntax and verify the output. ## Acknowledgements - @MichaReiser for reviewing 100+ parser PRs and continuously providing guidance throughout the project - @LaBatata101 for initiating the transition to a hand-written parser in #9152 - @addisoncrump for implementing the fuzzer which helped [catch](https://github.com/astral-sh/ruff/pull/10903) [a](https://github.com/astral-sh/ruff/pull/10910) [lot](https://github.com/astral-sh/ruff/pull/10966) [of](https://github.com/astral-sh/ruff/pull/10896) [bugs](https://github.com/astral-sh/ruff/pull/10877) --------- Co-authored-by: Victor Hugo Gomes <labatata101@linuxmail.org> Co-authored-by: Micha Reiser <micha@reiser.io>	2024-04-18 17:57:39 +05:30

9 commits