mirror of
				https://github.com/astral-sh/ruff.git
				synced 2025-10-25 17:38:15 +00:00 
			
		
		
		
	
	
		
			3 commits
		
	
	
	| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|  Micha Reiser | 9f3a38d408 | Extract LineIndexindependent methods fromLocator(#13938)
		
			Some checks are pending
		
		
	 CI / Fuzz the parser (push) Blocked by required conditions CI / test scripts (push) Blocked by required conditions CI / ecosystem (push) Blocked by required conditions CI / cargo shear (push) Blocked by required conditions CI / Determine changes (push) Waiting to run CI / cargo fmt (push) Waiting to run CI / cargo clippy (push) Blocked by required conditions CI / python package (push) Waiting to run CI / cargo test (linux) (push) Blocked by required conditions CI / cargo test (windows) (push) Blocked by required conditions CI / cargo test (wasm) (push) Blocked by required conditions CI / cargo build (release) (push) Blocked by required conditions CI / cargo build (msrv) (push) Blocked by required conditions CI / cargo fuzz (push) Blocked by required conditions CI / pre-commit (push) Waiting to run CI / mkdocs (push) Waiting to run CI / formatter instabilities and black similarity (push) Blocked by required conditions CI / test ruff-lsp (push) Blocked by required conditions CI / benchmarks (push) Blocked by required conditions | ||
|  Dhruv Manilawala | bf5b62edac | Maintain synchronicity between the lexer and the parser (#11457) ## Summary This PR updates the entire parser stack in multiple ways: ### Make the lexer lazy * https://github.com/astral-sh/ruff/pull/11244 * https://github.com/astral-sh/ruff/pull/11473 Previously, Ruff's lexer would act as an iterator. The parser would collect all the tokens in a vector first and then process the tokens to create the syntax tree. The first task in this project is to update the entire parsing flow to make the lexer lazy. This includes the `Lexer`, `TokenSource`, and `Parser`. For context, the `TokenSource` is a wrapper around the `Lexer` to filter out the trivia tokens[^1]. Now, the parser will ask the token source to get the next token and only then the lexer will continue and emit the token. This means that the lexer needs to be aware of the "current" token. When the `next_token` is called, the current token will be updated with the newly lexed token. The main motivation to make the lexer lazy is to allow re-lexing a token in a different context. This is going to be really useful to make the parser error resilience. For example, currently the emitted tokens remains the same even if the parser can recover from an unclosed parenthesis. This is important because the lexer emits a `NonLogicalNewline` in parenthesized context while a normal `Newline` in non-parenthesized context. This different kinds of newline is also used to emit the indentation tokens which is important for the parser as it's used to determine the start and end of a block. Additionally, this allows us to implement the following functionalities: 1. Checkpoint - rewind infrastructure: The idea here is to create a checkpoint and continue lexing. At a later point, this checkpoint can be used to rewind the lexer back to the provided checkpoint. 2. Remove the `SoftKeywordTransformer` and instead use lookahead or speculative parsing to determine whether a soft keyword is a keyword or an identifier 3. Remove the `Tok` enum. The `Tok` enum represents the tokens emitted by the lexer but it contains owned data which makes it expensive to clone. The new `TokenKind` enum just represents the type of token which is very cheap. This brings up a question as to how will the parser get the owned value which was stored on `Tok`. This will be solved by introducing a new `TokenValue` enum which only contains a subset of token kinds which has the owned value. This is stored on the lexer and is requested by the parser when it wants to process the data. For example: | ||
|  Dhruv Manilawala | 28cc71fb6b | Remove cyclic dev dependency with the parser crate (#11261) ## Summary This PR removes the cyclic dev dependency some of the crates had with the parser crate. The cyclic dependencies are: * `ruff_python_ast` has a **dev dependency** on `ruff_python_parser` and `ruff_python_parser` directly depends on `ruff_python_ast` * `ruff_python_trivia` has a **dev dependency** on `ruff_python_parser` and `ruff_python_parser` has an indirect dependency on `ruff_python_trivia` (`ruff_python_parser` - `ruff_python_ast` - `ruff_python_trivia`) Specifically, this PR does the following: * Introduce two new crates * `ruff_python_ast_integration_tests` and move the tests from the `ruff_python_ast` crate which uses the parser in this crate * `ruff_python_trivia_integration_tests` and move the tests from the `ruff_python_trivia` crate which uses the parser in this crate ### Motivation The main motivation for this PR is to help development. Before this PR, `rust-analyzer` wouldn't provide any intellisense in the `ruff_python_parser` crate regarding the symbols in `ruff_python_ast` crate. ``` [ERROR][2024-05-03 13:47:06] .../vim/lsp/rpc.lua:770 "rpc" "/Users/dhruv/.cargo/bin/rust-analyzer" "stderr" "[ERROR project_model::workspace] cyclic deps: ruff_python_parser(Idx::<CrateData>(50)) -> ruff_python_ast(Idx::<CrateData>(37)), alternative path: ruff_python_ast(Idx::<CrateData>(37)) -> ruff_python_parser(Idx::<CrateData>(50))\n" ``` ## Test Plan Check the logs of `rust-analyzer` to not see any signs of cyclic dependency. |