language-servers/ruff - Forgejo: Beyond coding. We Forge.

mirror of https://github.com/astral-sh/ruff.git synced 2025-10-28 18:53:25 +00:00

Author	SHA1	Message	Date
Andrew Gallant	0de8216a25	test: update snapshots with just whitespace changes These snapshot changes should all only be a result of changes to trailing whitespace in the output. I checked a psuedo random sample of these, and the whitespace found in the previous snapshots seems to be an artifact of the rendering and _not_ of the source data. So this seems like a strict bug fix to me. There are other snapshots with whitespace changes, but they also have other changes that we split out into separate commits. Basically, we're going to do approximately one commit per category of change. This represents, by far, the biggest chunk of changes to snapshots as a result of the `annotate-snippets` upgrade.	2025-01-15 13:37:52 -05:00
Andrew Gallant	84179aaa96	ruff_linter,ruff_python_parser: migrate to updated `annotate-snippets` This is pretty much just moving to the new API and taking care to use byte offsets. This is almost enough. The next commit will fix a bug involving the handling of unprintable characters as a result of switching to byte offsets.	2025-01-15 13:37:52 -05:00
Dylan	c1eaf6ff72	Modify parsing of raise with cause when exception is absent (#15049 ) When confronted with `raise from exc` the parser will now create a `StmtRaise` that has `None` for the exception and `exc` for the cause. Before, the parser created a `StmtRaise` with `from` for the exception, no cause, and a spurious expression `exc` afterwards.	2024-12-19 13:36:32 +00:00
Dylan	a3bb0cd5ec	Raise syntax error for mixing `except` and `except` (#14895 ) This PR adds a syntax error if the parser encounters a `TryStmt` that has except clauses both with and without a star. The displayed error points to each except clause that contradicts the original except clause kind. So, for example, ```python try: .... except: #<-- we assume this is the desired except kind .... except: #<--- error will point here .... except*: #<--- and here .... ``` Closes #14860	2024-12-10 17:50:55 -06:00
Dimitri Papadopoulos Orfanos	59145098d6	Fix typos found by codespell (#14863 ) ## Summary Just fix typos. ## Test Plan CI tests. --------- Co-authored-by: Micha Reiser <micha@reiser.io>	2024-12-09 09:32:12 +00:00
Micha Reiser	b63c2e126b	Upgrade Rust toolchain to 1.83 (#14677 )	2024-11-29 12:05:05 +00:00
Alex Waygood	f1b2e85339	py-fuzzer: recommend using `uvx` rather than `uv run` to run the fuzzer (#14645 )	2024-11-27 22:19:52 +00:00
Alex Waygood	e0f3eaf1dd	Turn the `fuzz-parser` script into a properly packaged Python project (#14606 ) ## Summary This PR gets rid of the `requirements.in` and `requirements.txt` files in the `scripts/fuzz-parser` directory, and replaces them with `pyproject.toml` and `uv.lock` files. The script is renamed from `fuzz-parser` to `py-fuzzer` (since it can now also be used to fuzz red-knot as well as the parser, following https://github.com/astral-sh/ruff/pull/14566), and moved from the `scripts/` directory to the `python/` directory, since it's now a (uv)-pip-installable project in its own right. I've been resisting this for a while, because conceptually this script just doesn't feel "complicated" enough to me for it to be a full-blown package. However, I think it's time to do this. Making it a proper package has several advantages: - It means we can run it from the project root using `uv run` without having to activate a virtual environment and ensure that all required dependencies are installed into that environment - Using a `pyproject.toml` file means that we can express that the project requires Python 3.12+ to run properly; this wasn't possible before - I've been running mypy on the project locally when I've been working on it or reviewing other people's PRs; now I can put the mypy config for the project in the `pyproject.toml` file ## Test Plan I manually tested that all the commands detailed in `python/py-fuzzer/README.md` work for me locally. --------- Co-authored-by: David Peter <sharkdp@users.noreply.github.com>	2024-11-27 08:09:04 +00:00
Micha Reiser	c847cad389	Update insta snapshots (#14366 )	2024-11-15 19:31:15 +01:00
Micha Reiser	bd33b4972d	Short circuit `lex_identifier` if the name is longer or shorter than any known keyword (#13815 )	2024-10-19 11:07:15 +00:00
Junzhuo ZHOU	a354d9ead6	Expose internal types as public access (#13509 )	2024-09-26 17:34:30 +02:00
Micha Reiser	c3bcd5c842	Upgrade to Rust 1.81 (#13265 )	2024-09-06 15:09:09 +02:00
Alex Waygood	b7c7b4b387	Add a method to `Checker` for cached parsing of stringified type annotations (#13158 )	2024-09-02 12:44:20 +00:00
Micha Reiser	138e70bd5c	Upgrade to Rust 1.80 (#12586 )	2024-07-30 19:18:08 +00:00
Dhruv Manilawala	978909fcf4	Raise syntax error for unparenthesized generator expr in multi-argument call (#12445 ) ## Summary This PR fixes a bug to raise a syntax error when an unparenthesized generator expression is used as an argument to a call when there are more than one argument. For reference, the grammar is: ``` primary: \| ... \| primary genexp \| primary '(' [arguments] ')' \| ... genexp: \| '(' ( assignment_expression \| expression !':=') for_if_clauses ')' ``` The `genexp` requires the parenthesis as mentioned in the grammar. So, the grammar for a call expression is either a name followed by a generator expression or a name followed by a list of argument. In the former case, the parenthesis are excluded because the generator expression provides them while in the later case, the parenthesis are explicitly provided for a list of arguments which means that the generator expression requires it's own parenthesis. This was discovered in https://github.com/astral-sh/ruff/issues/12420. ## Test Plan Add test cases for valid and invalid syntax. Make sure that the parser from CPython also raises this at the parsing step: ```console $ python3.13 -m ast parser/_.py File "parser/_.py", line 1 total(1, 2, x for x in range(5), 6) ^^^^^^^^^^^^^^^^^^^ SyntaxError: Generator expression must be parenthesized $ python3.13 -m ast parser/_.py File "parser/_.py", line 1 sum(x for x in range(10), 10) ^^^^^^^^^^^^^^^^^^^^ SyntaxError: Generator expression must be parenthesized ```	2024-07-22 14:44:20 +05:30
Dhruv Manilawala	8f40928534	Enable token-based rules on source with syntax errors (#11950 ) ## Summary This PR updates the linter, specifically the token-based rules, to work on the tokens that come after a syntax error. For context, the token-based rules only diagnose the tokens up to the first lexical error. This PR builds up an error resilience by introducing a `TokenIterWithContext` which updates the `nesting` level and tries to reflect it with what the lexer is seeing. This isn't 100% accurate because if the parser recovered from an unclosed parenthesis in the middle of the line, the context won't reduce the nesting level until it sees the newline token at the end of the line. resolves: #11915 ## Test Plan * Add test cases for a bunch of rules that are affected by this change. * Run the fuzzer for a long time, making sure to fix any other bugs.	2024-07-02 08:57:46 +00:00
Micha Reiser	5109b50bb3	Use `CompactString` for `Identifier` (#12101 )	2024-07-01 10:06:02 +02:00
Micha Reiser	f765d19402	Mention that `Cursor` is based on rustc's implementation. (#12109 )	2024-06-30 16:53:25 +01:00
Micha Reiser	da78de0439	Remove allcation in `parse_identifier` (#12103 )	2024-06-29 15:00:24 +02:00
Dhruv Manilawala	434ce307a7	Revert "Use correct range to highlight line continuation error" (#12089 ) This PR reverts https://github.com/astral-sh/ruff/pull/12016 with a small change where the error location points to the continuation character only. Earlier, it would also highlight the whitespace that came before it. The motivation for this change is to avoid panic in https://github.com/astral-sh/ruff/pull/11950. For example: ```py \) ``` Playground: https://play.ruff.rs/87711071-1b54-45a3-b45a-81a336a1ea61 The range of `Unknown` token and `Rpar` is the same. Once #11950 is enabled, the indexer would panic. It won't panic in the stable version because we stop at the first `Unknown` token.	2024-06-28 18:10:00 +05:30
Dhruv Manilawala	a4688aebe9	Use `TokenSource` to find new location for re-lexing (#12060 ) ## Summary This PR splits the re-lexing logic into two parts: 1. `TokenSource`: The token source will be responsible to find the position the lexer needs to be moved to 2. `Lexer`: The lexer will be responsible to reduce the nesting level and move itself to the new position if recovered from a parenthesized context This split makes it easy to find the new lexer position without needing to implement the backwards lexing logic again which would need to handle cases involving: * Different kinds of newlines * Line continuation character(s) * Comments * Whitespaces ### F-strings This change did reveal one thing about re-lexing f-strings. Consider the following example: ```py f'{' # ^ f'foo' ``` Here, the quote as highlighted by the caret (`^`) is the start of a string inside an f-string expression. This is unterminated string which means the token emitted is actually `Unknown`. The parser tries to recover from it but there's no newline token in the vector so the new logic doesn't recover from it. The previous logic does recover because it's looking at the raw characters instead. The parser would be at `FStringStart` (the one for the second line) when it calls into the re-lexing logic to recover from an unterminated f-string on the first line. So, moving backwards the first character encountered is a newline character but the first token encountered is an `Unknown` token. This is improved with #12067 fixes: #12046 fixes: #12036 ## Test Plan Update the snapshot and validate the changes.	2024-06-27 17:12:39 +05:30
Dhruv Manilawala	e137c824c3	Avoid consuming newline for unterminated string (#12067 ) ## Summary This PR fixes the lexer logic to not consume the newline character for an unterminated string literal. Currently, the lexer would consume it to be part of the string itself but that would be bad for recovery because then the lexer wouldn't emit the newline token ever. This PR fixes that to avoid consuming the newline character in that case. This was discovered during https://github.com/astral-sh/ruff/pull/12060. ## Test Plan Update the snapshots and validate them.	2024-06-27 17:02:48 +05:30
Dhruv Manilawala	47c9ed07f2	Consider 2-character EOL before line continuation (#12035 ) ## Summary This PR fixes a bug introduced in https://github.com/astral-sh/ruff/pull/12008 which didn't consider the two character newline after the line continuation character. For example, consider the following code highlighted with whitespaces: ```py call(foo # comment \\r\n \r\n def bar():\r\n ....pass\r\n ``` The lexer is at `def` when it's running the re-lexing logic and trying to move back to a newline character. It encounters `\n` and it's being escaped (incorrect) but `\r` is being escaped, so it moves the lexer to `\n` character. This creates an overlap in token ranges which causes the panic. ``` Name 0..4 Lpar 4..5 Name 5..8 Comment 9..20 NonLogicalNewline 20..22 <-- overlap between Newline 21..22 <-- these two tokens NonLogicalNewline 22..23 Def 23..26 ... ``` fixes: #12028 ## Test Plan Add a test case with line continuation and windows style newline character.	2024-06-26 14:00:48 +05:30
Dhruv Manilawala	7cb2619ef5	Add syntax error for empty type parameter list (#12030 ) ## Summary (I'm pretty sure I added this in the parser re-write but must've got lost in the rebase?) This PR raises a syntax error if the type parameter list is empty. As per the grammar, there should be at least one type parameter: ``` type_params: \| invalid_type_params \| '[' type_param_seq ']' type_param_seq: ','.type_param+ [','] ``` Verified via the builtin `ast` module as well: ```console $ python3.13 -m ast parser/_.py Traceback (most recent call last): [..] File "parser/_.py", line 1 def foo[](): ^ SyntaxError: Type parameter list cannot be empty ``` ## Test Plan Add inline test cases and update the snapshots.	2024-06-26 08:10:35 +05:30
Dhruv Manilawala	7109214b57	Update parser tests to validate token ranges (#12019 ) ## Summary This PR updates the parser test infrastructure to validate the token ranges. From the code documentation: ``` /// Verifies that: /// * the ranges are strictly increasing when loop the tokens in insertion order /// * all ranges are within the length of the source code ``` Follow-up from #12016 and #12017 resolves: #11938 ## Test Plan Make sure that there are no failures.	2024-06-25 08:14:28 +00:00
Dhruv Manilawala	d930e97212	Do not include newline for unterminated string range (#12017 ) ## Summary This PR updates the unterminated string error range to not include the final newline character. This is a follow-up to #12016 and required for #12019 This is not done for when the unterminated string goes till the end of file (not a newline character). The unterminated f-string range is correct. ### Why is this required for #12019 ? Because otherwise the token ranges will overlap. For example: ```py f"{" f"{foo!r" ``` Here, the re-lexing logic recovers from an unterminated f-string and thus emitting a `Newline` token for the one at the end of the first line. But, currently the `Unknown` and the `Newline` token would overlap because the `Unknown` token (unterminated string literal) range would include the newline character. ## Test Plan Update and validate the snapshot.	2024-06-25 08:10:07 +00:00
Dhruv Manilawala	9c1b6ec411	Use correct range to highlight line continuation error (#12016 ) ## Summary This PR fixes the range highlighted for the line continuation error. Previously, it would highlight an incorrect range: ``` 1 \| call(a, b, \\\ \| ^^ Syntax Error: unexpected character after line continuation character 2 \| 3 \| def bar(): \| ``` And now: ``` \| 1 \| call(a, b, \\\ \| ^ Syntax Error: unexpected character after line continuation character 2 \| 3 \| def bar(): \| ``` This is implemented by avoiding to update the token range for the `Unknown` token which is emitted when there's a lexical error. Instead, the `push_error` helper method will be responsible to update the range to the error location. This actually becomes a requirement which can be seen in follow-up PRs. ## Test Plan Update and validate the snapshot.	2024-06-25 13:35:24 +05:30
Dhruv Manilawala	68a8978454	Consider line continuation character for re-lexing (#12008 ) ## Summary This PR fixes a bug where the re-lexing logic didn't consider the line continuation character being present before the newline character. This meant that the lexer was being moved back to the newline character which is actually ignored via `\`. Considering the following code: ```py f'middle {'string':\ 'format spec'} ``` The old token stream is: ``` ... Colon 18..19 FStringMiddle 19..29 (flags = F_STRING) Newline 20..21 Indent 21..29 String 29..42 Rbrace 42..43 ... ``` Notice how the ranges are overlapping between the `FStringMiddle` token and the tokens emitted after moving the lexer backwards. After this fix, the new token stream which is without moving the lexer backwards in this scenario: ``` FStringStart 0..2 (flags = F_STRING) FStringMiddle 2..9 (flags = F_STRING) Lbrace 9..10 String 10..18 Colon 18..19 FStringMiddle 19..29 (flags = F_STRING) FStringEnd 29..30 (flags = F_STRING) Name 30..36 Name 37..41 Unknown 41..44 Newline 44..45 ``` fixes: #12004 ## Test Plan Add test cases and update the snapshots.	2024-06-25 02:13:54 +00:00
renovate[bot]	53a80a5c11	Update Rust crate rustc-hash to v2 (#12001 )	2024-06-23 20:46:42 -04:00
Dhruv Manilawala	81160320de	Manual impl of `Debug` on `Token` (#11958 ) ## Summary I look at the token stream a lot, not specifically in the playground but in the terminal output and it's annoying to scroll a lot to find specific location. Most of the information is also redundant. The final format we end up with is: `<kind> <range> (flags = ...)` e.g., `String 0..4 (flags = BYTE_STRING)` where the flags part is only populated if there are any flags set.	2024-06-22 04:18:24 +00:00
Dhruv Manilawala	27ebff36ec	Remove `Token::is_trivia` method (#11962 ) Sorry, a leftover from my rebase	2024-06-21 10:24:42 +00:00
Dhruv Manilawala	96da136e6a	Move token and error structs into related modules (#11957 ) ## Summary This PR does some housekeeping into moving certain structs into related modules. Specifically, 1. Move `LexicalError` from `lexer.rs` to `error.rs` which also contains the `ParseError` 2. Move `Token`, `TokenFlags` and `TokenValue` from `lexer.rs` to `token.rs`	2024-06-21 10:07:19 +00:00
Dhruv Manilawala	4667d8697c	Remove duplication around `is_trivia` functions (#11956 ) ## Summary This PR removes the duplication around `is_trivia` functions. There are two of them in the codebase: 1. In `pycodestyle`, it's for newline, indent, dedent, non-logical newline and comment 2. In the parser, it's for non-logical newline and comment The `TokenKind::is_trivia` method used (1) but that's not correct in that context. So, this PR introduces a new `is_non_logical_token` helper method for the `pycodestyle` crate and updates the `TokenKind::is_trivia` implementation with (2). This also means we can remove `Token::is_trivia` method and the standalone `token_source::is_trivia` function and use the one on `TokenKind`. ## Test Plan `cargo insta test`	2024-06-21 10:02:40 +00:00
Dhruv Manilawala	ed948eaefb	Avoid moving back the lexer for triple-quoted fstring (#11939 ) ## Summary This PR avoids moving back the lexer for a triple-quoted f-string during the re-lexing phase. The reason this is a problem is that for a triple-quoted f-string the newlines are part of the f-string itself, specifically they'll be part of the `FStringMiddle` token. So, if we moved the lexer back, there would be a `Newline` token whose range would be in between an `FStringMiddle` token. This creates a panic in downstream usage. fixes: #11937 ## Test Plan Add test cases and validate the snapshots.	2024-06-20 16:27:36 +05:30
Dhruv Manilawala	b617d90651	Update `E999` to show all syntax errors (#11900 ) ## Summary This PR updates the linter to show all the parse errors as diagnostics instead of just the first one. Note that this doesn't affect the parse error displayed as error log message. This will be removed in a follow-up PR. ### Breaking? I don't think this is a breaking change even though this might give more diagnostics. The main reason is that this shouldn't affect any users because it'll only give additional diagnostics in the case of multiple syntax errors. ## Test Plan Add an integration test case which would raise more than one parse error.	2024-06-19 13:09:54 +05:30
Dhruv Manilawala	cdc7c71449	Avoid consuming trailing whitespace during re-lexing (#11933 ) ## Summary This PR updates the re-lexing logic to avoid consuming the trailing whitespace and move the lexer explicitly to the last newline character encountered while moving backwards. Consider the following code snippet as taken from the test case highlighted with whitespace (`.`) and newline (`\n`) characters: ```py # There are trailing whitespace before the newline character but those whitespaces are # part of the comment token f"""hello {x # comment....\n # ^ y = 1\n ``` The parser is at `y` when it's trying to recover from an unclosed `{`, so it calls into the re-lexing logic which tries to move the lexer back to the end of the previous line. But, as it consumed all whitespaces it moved the lexer to the location marked by `^` in the above code snippet. But, those whitespaces are part of the comment token. This means that the range for the two tokens were overlapping which introduced the panic. Note that this is only a bug when there's a comment with a trailing whitespace otherwise it's fine to move the lexer to the whitespace character. This is because the lexer would just skip the whitespace otherwise. Nevertheless, this PR updates the logic to move it explicitly to the newline character in all cases. fixes: #11929 ## Test Plan Add test cases and update the snapshot. Make sure that it doesn't panic on the code snippet in the linked issue.	2024-06-19 12:14:18 +05:30
Dhruv Manilawala	1e0642fac8	Use re-lexing for normal list parsing (#11871 ) ## Summary This PR is a follow-up on #11845 to add the re-lexing logic for normal list parsing. A normal list parsing is basically parsing elements without any separator in between i.e., there can only be trivia tokens in between the two elements. Currently, this is only being used for parsing assignment statement and f-string elements. Assignment statements cannot be in a parenthesized context, but f-string can have curly braces so this PR is specifically for them. I don't think this is an ideal recovery but the problem is that both lexer and parser could add an error for f-strings. If the lexer adds an error it'll emit an `Unknown` token instead while the parser adds the error directly. I think we'd need to move all f-string errors to be emitted by the parser instead. This way the parser can correctly inform the lexer that it's out of an f-string and then the lexer can pop the current f-string context out of the stack. ## Test Plan Add test cases, update the snapshots, and run the fuzzer.	2024-06-18 12:14:41 +05:30
Dhruv Manilawala	8499abfa7f	Implement re-lexing logic for better error recovery (#11845 ) ## Summary This PR implements the re-lexing logic in the parser. This logic is only applied when recovering from an error during list parsing. The logic is as follows: 1. During list parsing, if an unexpected token is encountered and it detects that an outer context can understand it and thus recover from it, it invokes the re-lexing logic in the lexer 2. This logic first checks if the lexer is in a parenthesized context and returns if it's not. Thus, the logic is a no-op if the lexer isn't in a parenthesized context 3. It then reduces the nesting level by 1. It shouldn't reset it to 0 because otherwise the recovery from nested list parsing will be incorrect 4. Then, it tries to find last newline character going backwards from the current position of the lexer. This avoids any whitespaces but if it encounters any character other than newline or whitespace, it aborts. 5. Now, if there's a newline character, then it needs to be re-lexed in a logical context which means that the lexer needs to emit it as a `Newline` token instead of `NonLogicalNewline`. 6. If the re-lexing gives a different token than the current one, the token source needs to update it's token collection to remove all the tokens which comes after the new current position. It turns out that the list parsing isn't that happy with the results so it requires some re-arranging such that the following two errors are raised correctly: 1. Expected comma 2. Recovery context error For (1), the following scenarios needs to be considered: * Missing comma between two elements * Half parsed element because the grammar doesn't allow it (for example, named expressions) For (2), the following scenarios needs to be considered: 1. If the parser is at a comma which means that there's a missing element otherwise the comma would've been consumed by the first `eat` call above. And, the parser doesn't take the re-lexing route on a comma token. 2. If it's the first element and the current token is not a comma which means that it's an invalid element. resolves: #11640 ## Test Plan - [x] Update existing test snapshots and validate them - [x] Add additional test cases specific to the re-lexing logic and validate the snapshots - [x] Run the fuzzer on 3000+ valid inputs - [x] Run the fuzzer on invalid inputs - [x] Run the parser on various open source projects - [x] Make sure the ecosystem changes are none	2024-06-17 06:47:00 +00:00
Micha Reiser	d4dd96d1f4	red-knot: `source_text`, `line_index`, and `parsed_module` queries (#11822 )	2024-06-13 07:37:02 +00:00
Dhruv Manilawala	60ea72a6bc	Add list terminator kind for error recovery (#11843 ) ## Summary This PR adds a new enum to determine the kind of terminator token i.e., is it actually terminates the list or is it used for error recovery. This is important because the parser should take the error recovery route in case the terminator token is used for better error recovery. This will then try to re-lex the token if it's the case. I haven't updated any reference to use this new enum as otherwise it'll update the snapshots. I plan to do that in a follow-up PR so that it's easier to reason about. ## Test plan `cargo insta test`	2024-06-12 08:33:26 +00:00
Dhruv Manilawala	a525b4be3d	Separate terminator token for f-string elements kind (#11842 ) ## Summary This PR separates the terminator token for f-string elements depending on the context. A list of f-string element can occur either in a regular f-string or a format spec of an f-string. The terminator token is different depending on that context. ## Test Plan `cargo insta test` and verify the updated snapshots.	2024-06-12 13:57:35 +05:30
Dhruv Manilawala	db8f2c2d9f	Use the existing `ruff_python_trivia::is_python_whitespace` function (#11844 ) ## Summary This PR re-uses the `ruff_python_trivia::is_python_whitespace` in the lexer instead of defining its own. This was mainly to avoid circular dependency which was resolved in #11261.	2024-06-12 05:59:19 +00:00
Dhruv Manilawala	549cc1e437	Build `CommentRanges` outside the parser (#11792 ) ## Summary This PR updates the parser to remove building the `CommentRanges` and instead it'll be built by the linter and the formatter when it's required. For the linter, it'll be built and owned by the `Indexer` while for the formatter it'll be built from the `Tokens` struct and passed as an argument. ## Test Plan `cargo insta test`	2024-06-09 09:55:17 +00:00
Micha Reiser	32ca704956	Rename `PreorderVisitor` to `SourceOrderVisitor` (#11798 ) Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>	2024-06-07 17:01:58 +00:00
Dhruv Manilawala	1b7d08c2c9	Consider `:` to terminate parenthesized with items (#11775 ) ## Summary This PR is a follow-up to this discussion (https://github.com/astral-sh/ruff/pull/11770#discussion_r1628917209) which adds the `:` token in the terminator set for parenthesized with items. The main motivation is to avoid parsing too much in speculative mode. This is evident with the following _before_ and _after_ parsed with items list for the following code: ```py with (item1, item2: foo ``` <table> <tr> <th>Before (3 items)</th> <th>After (2 items)</th> </tr> <tr> <td> <pre> parsed_with_items: [ ParsedWithItem { item: WithItem { range: 6..11, context_expr: Name( ExprName { range: 6..11, id: "item1", ctx: Load, }, ), optional_vars: None, }, is_parenthesized: false, }, ParsedWithItem { item: WithItem { range: 13..18, context_expr: Name( ExprName { range: 13..18, id: "item2", ctx: Load, }, ), optional_vars: None, }, is_parenthesized: false, }, ParsedWithItem { item: WithItem { range: 24..27, context_expr: Name( ExprName { range: 24..27, id: "foo", ctx: Load, }, ), optional_vars: None, }, is_parenthesized: false, }, ] </pre> </td> <td> <pre> parsed_with_items: [ ParsedWithItem { item: WithItem { range: 6..11, context_expr: Name( ExprName { range: 6..11, id: "item1", ctx: Load, }, ), optional_vars: None, }, is_parenthesized: false, }, ParsedWithItem { item: WithItem { range: 13..18, context_expr: Name( ExprName { range: 13..18, id: "item2", ctx: Load, }, ), optional_vars: None, }, is_parenthesized: false, }, ] </pre> </td> </tr> </table> ## Test Plan `cargo insta test`	2024-06-06 18:40:44 +05:30
Dhruv Manilawala	6c1fa1d440	Use speculative parsing for with-items (#11770 ) ## Summary This PR updates the with-items parsing logic to use speculative parsing instead. ### Existing logic First, let's understand the previous logic: 1. The parser sees `(`, it doesn't know whether it's part of a parenthesized with items or a parenthesized expression 2. Consider it a parenthesized with items and perform a hand-rolled speculative parsing 3. Then, verify the assumption and if it's incorrect convert the parsed with items into an appropriate expression which becomes part of the first with item Here, in (3) there are lots of edge cases which we've to deal with: 1. Trailing comma with a single element should be [converted to the expression as is](`9b2cf569b2/crates/ruff_python_parser/src/parser/statement.rs (L2140-L2153)`) 2. Trailing comma with multiple elements should be [converted to a tuple expression](`9b2cf569b2/crates/ruff_python_parser/src/parser/statement.rs (L2155-L2178)`) 3. Limit the allowed expression based on whether it's [(1)](`9b2cf569b2/crates/ruff_python_parser/src/parser/statement.rs (L2144-L2152)`) or [(2)](`9b2cf569b2/crates/ruff_python_parser/src/parser/statement.rs (L2157-L2171)`) 4. [Consider postfix expressions](`9b2cf569b2/crates/ruff_python_parser/src/parser/statement.rs (L2181-L2200)`) after (3) 5. [Consider `if` expressions](`9b2cf569b2/crates/ruff_python_parser/src/parser/statement.rs (L2203-L2208)`) after (3) 6. [Consider binary expressions](`9b2cf569b2/crates/ruff_python_parser/src/parser/statement.rs (L2210-L2228)`) after (3) Consider other cases like * [Single generator expression](`9b2cf569b2/crates/ruff_python_parser/src/parser/statement.rs (L2020-L2035)`) * [Expecting a comma](`9b2cf569b2/crates/ruff_python_parser/src/parser/statement.rs (L2122-L2130)`) And, this is all possible only if we allow parsing these expressions in the [with item parsing logic](`9b2cf569b2/crates/ruff_python_parser/src/parser/statement.rs (L2287-L2334)`). ### Speculative parsing With #11457 merged, we can simplify this logic by changing the step (3) from above to just rewind the parser back to the `(` if our assumption (parenthesized with-items) was incorrect and then continue parsing it considering parenthesized expression. This also behaves a lot similar to what a PEG parser does which is to consider the first grammar rule and if it fails consider the second grammar rule and so on. resolves: #11639 ## Test Plan - [x] Verify the updated snapshots - [x] Run the fuzzer on around 3000 valid source code (locally)	2024-06-06 08:59:56 +00:00
Dhruv Manilawala	eed6d784df	Update type annotation parsing API to return `Parsed` (#11739 ) ## Summary This PR updates the return type of `parse_type_annotation` from `Expr` to `Parsed<ModExpression>`. This is to allow accessing the tokens for the parsed sub-expression in the follow-up PR. ## Test Plan `cargo insta test`	2024-06-05 12:59:43 +05:30
Dhruv Manilawala	2567e14b7a	Lexer should consider BOM for the start offset (#11732 ) ## Summary This PR fixes a bug where the lexer didn't consider the BOM into the start offset. fixes: #11731 ## Test Plan Add multiple test cases which involves BOM character in the source for the lexer and verify the snapshot.	2024-06-04 08:45:46 +00:00
Dhruv Manilawala	3b19df04d7	Use cursor offset for lexer checkpoint (#11734 ) ## Summary This PR updates the lexer checkpoint to store the cursor offset instead of cloning the cursor itself. This reduces the size of `LexerCheckpoint` from 136 to 112 bytes and also removes the need for lifetime. ## Test Plan `cargo insta test`	2024-06-04 14:13:57 +05:30
Micha Reiser	64165bee43	red-knot: Use `parse_unchecked` to get all parse errors (#11725 )	2024-06-04 06:04:48 +00:00

1 2 3 4

192 commits