mirrors/ruff - Forgejo: Beyond coding. We Forge.

mirror of https://github.com/astral-sh/ruff.git synced 2025-07-14 16:45:10 +00:00

Author	SHA1	Message	Date
Micha Reiser	9cd0cdefd3	Assert that formatted code doesn't introduce any new unsupported syntax errors (#16549 ) ## Summary This should give us better coverage for the unsupported syntax error features and increases our confidence that the formatter doesn't accidentially introduce new unsupported syntax errors. A feature like this would have been very useful when working on f-string formatting where it took a lot of iteration to find all Python 3.11 or older incompatibilities. ## Test Plan I applied my changes on top of https://github.com/astral-sh/ruff/pull/16523 and removed the target version check in the with-statement formatting code. As expected, the integration tests now failed	2025-03-07 09:12:00 +01:00
Brent Westbrook	97d0659ce3	Pass `ParserOptions` to the parser (#16220 ) ## Summary This is part of the preparation for detecting syntax errors in the parser from https://github.com/astral-sh/ruff/pull/16090/. As suggested in [this comment](https://github.com/astral-sh/ruff/pull/16090/#discussion_r1953084509), I started working on a `ParseOptions` struct that could be stored in the parser. For this initial refactor, I only made it hold the existing `Mode` option, but for syntax errors, we will also need it to have a `PythonVersion`. For that use case, I'm picturing something like a `ParseOptions::with_python_version` method, so you can extend the current calls to something like ```rust ParseOptions::from(mode).with_python_version(settings.target_version) ``` But I thought it was worth adding `ParseOptions` alone without changing any other behavior first. Most of the diff is just updating call sites taking `Mode` to take `ParseOptions::from(Mode)` or those taking `PySourceType`s to take `ParseOptions::from(PySourceType)`. The interesting changes are in the new `parser/options.rs` file and smaller parts of `parser/mod.rs` and `ruff_python_parser/src/lib.rs`. ## Test Plan Existing tests, this should not change any behavior.	2025-02-19 10:50:50 -05:00
Brent Westbrook	a9efdea113	Use `ast::PythonVersion` internally in the formatter and linter (#16170 ) ## Summary This PR updates the formatter and linter to use the `PythonVersion` struct from the `ruff_python_ast` crate internally. While this doesn't remove the need for the `linter::PythonVersion` enum, it does remove the `formatter::PythonVersion` enum and limits the use in the linter to deserializing from CLI arguments and config files and moves most of the remaining methods to the `ast::PythonVersion` struct. ## Test Plan Existing tests, with some inputs and outputs updated to reflect the new (de)serialization format. I think these are test-specific and shouldn't affect any external (de)serialization. --------- Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>	2025-02-18 12:03:13 -05:00
Micha Reiser	82faa9bb62	Add tests demonstrating f-strings with debug expressions in replacements that contain escaped characters (#14929 )	2024-12-12 09:33:20 +00:00
Dhruv Manilawala	bf5b62edac	Maintain synchronicity between the lexer and the parser (#11457 ) ## Summary This PR updates the entire parser stack in multiple ways: ### Make the lexer lazy * https://github.com/astral-sh/ruff/pull/11244 * https://github.com/astral-sh/ruff/pull/11473 Previously, Ruff's lexer would act as an iterator. The parser would collect all the tokens in a vector first and then process the tokens to create the syntax tree. The first task in this project is to update the entire parsing flow to make the lexer lazy. This includes the `Lexer`, `TokenSource`, and `Parser`. For context, the `TokenSource` is a wrapper around the `Lexer` to filter out the trivia tokens[^1]. Now, the parser will ask the token source to get the next token and only then the lexer will continue and emit the token. This means that the lexer needs to be aware of the "current" token. When the `next_token` is called, the current token will be updated with the newly lexed token. The main motivation to make the lexer lazy is to allow re-lexing a token in a different context. This is going to be really useful to make the parser error resilience. For example, currently the emitted tokens remains the same even if the parser can recover from an unclosed parenthesis. This is important because the lexer emits a `NonLogicalNewline` in parenthesized context while a normal `Newline` in non-parenthesized context. This different kinds of newline is also used to emit the indentation tokens which is important for the parser as it's used to determine the start and end of a block. Additionally, this allows us to implement the following functionalities: 1. Checkpoint - rewind infrastructure: The idea here is to create a checkpoint and continue lexing. At a later point, this checkpoint can be used to rewind the lexer back to the provided checkpoint. 2. Remove the `SoftKeywordTransformer` and instead use lookahead or speculative parsing to determine whether a soft keyword is a keyword or an identifier 3. Remove the `Tok` enum. The `Tok` enum represents the tokens emitted by the lexer but it contains owned data which makes it expensive to clone. The new `TokenKind` enum just represents the type of token which is very cheap. This brings up a question as to how will the parser get the owned value which was stored on `Tok`. This will be solved by introducing a new `TokenValue` enum which only contains a subset of token kinds which has the owned value. This is stored on the lexer and is requested by the parser when it wants to process the data. For example: `8196720f80/crates/ruff_python_parser/src/parser/expression.rs (L1260-L1262)` [^1]: Trivia tokens are `NonLogicalNewline` and `Comment` ### Remove `SoftKeywordTransformer` * https://github.com/astral-sh/ruff/pull/11441 * https://github.com/astral-sh/ruff/pull/11459 * https://github.com/astral-sh/ruff/pull/11442 * https://github.com/astral-sh/ruff/pull/11443 * https://github.com/astral-sh/ruff/pull/11474 For context, https://github.com/RustPython/RustPython/pull/4519/files#diff-5de40045e78e794aa5ab0b8aacf531aa477daf826d31ca129467703855408220 added support for soft keywords in the parser which uses infinite lookahead to classify a soft keyword as a keyword or an identifier. This is a brilliant idea as it basically wraps the existing Lexer and works on top of it which means that the logic for lexing and re-lexing a soft keyword remains separate. The change here is to remove `SoftKeywordTransformer` and let the parser determine this based on context, lookahead and speculative parsing. * Context: The transformer needs to know the position of the lexer between it being at a statement position or a simple statement position. This is because a `match` token starts a compound statement while a `type` token starts a simple statement. The parser already knows this. * Lookahead: Now that the parser knows the context it can perform lookahead of up to two tokens to classify the soft keyword. The logic for this is mentioned in the PR implementing it for `type` and `match soft keyword. * Speculative parsing: This is where the checkpoint - rewind infrastructure helps. For `match` soft keyword, there are certain cases for which we can't classify based on lookahead. The idea here is to create a checkpoint and keep parsing. Based on whether the parsing was successful and what tokens are ahead we can classify the remaining cases. Refer to #11443 for more details. If the soft keyword is being parsed in an identifier context, it'll be converted to an identifier and the emitted token will be updated as well. Refer `8196720f80/crates/ruff_python_parser/src/parser/expression.rs (L487-L491)`. The `case` soft keyword doesn't require any special handling because it'll be a keyword only in the context of a match statement. ### Update the parser API * https://github.com/astral-sh/ruff/pull/11494 * https://github.com/astral-sh/ruff/pull/11505 Now that the lexer is in sync with the parser, and the parser helps to determine whether a soft keyword is a keyword or an identifier, the lexer cannot be used on its own. The reason being that it's not sensitive to the context (which is correct). This means that the parser API needs to be updated to not allow any access to the lexer. Previously, there were multiple ways to parse the source code: 1. Passing the source code itself 2. Or, passing the tokens Now that the lexer and parser are working together, the API corresponding to (2) cannot exists. The final API is mentioned in this PR description: https://github.com/astral-sh/ruff/pull/11494. ### Refactor the downstream tools (linter and formatter) * https://github.com/astral-sh/ruff/pull/11511 * https://github.com/astral-sh/ruff/pull/11515 * https://github.com/astral-sh/ruff/pull/11529 * https://github.com/astral-sh/ruff/pull/11562 * https://github.com/astral-sh/ruff/pull/11592 And, the final set of changes involves updating all references of the lexer and `Tok` enum. This was done in two-parts: 1. Update all the references in a way that doesn't require any changes from this PR i.e., it can be done independently * https://github.com/astral-sh/ruff/pull/11402 * https://github.com/astral-sh/ruff/pull/11406 * https://github.com/astral-sh/ruff/pull/11418 * https://github.com/astral-sh/ruff/pull/11419 * https://github.com/astral-sh/ruff/pull/11420 * https://github.com/astral-sh/ruff/pull/11424 2. Update all the remaining references to use the changes made in this PR For (2), there were various strategies used: 1. Introduce a new `Tokens` struct which wraps the token vector and add methods to query a certain subset of tokens. These includes: 1. `up_to_first_unknown` which replaces the `tokenize` function 2. `in_range` and `after` which replaces the `lex_starts_at` function where the former returns the tokens within the given range while the latter returns all the tokens after the given offset 2. Introduce a new `TokenFlags` which is a set of flags to query certain information from a token. Currently, this information is only limited to any string type token but can be expanded to include other information in the future as needed. https://github.com/astral-sh/ruff/pull/11578 3. Move the `CommentRanges` to the parsed output because this information is common to both the linter and the formatter. This removes the need for `tokens_and_ranges` function. ## Test Plan - [x] Update and verify the test snapshots - [x] Make sure the entire test suite is passing - [x] Make sure there are no changes in the ecosystem checks - [x] Run the fuzzer on the parser - [x] Run this change on dozens of open-source projects ### Running this change on dozens of open-source projects Refer to the PR description to get the list of open source projects used for testing. Now, the following tests were done between `main` and this branch: 1. Compare the output of `--select=E999` (syntax errors) 2. Compare the output of default rule selection 3. Compare the output of `--select=ALL` Conclusion: all output were same ## What's next? The next step is to introduce re-lexing logic and update the parser to feed the recovery information to the lexer so that it can emit the correct token. This moves us one step closer to having error resilience in the parser and provides Ruff the possibility to lint even if the source code contains syntax errors.	2024-06-03 18:23:50 +05:30
Micha Reiser	ce14f4dea5	Range formatting API (#9635 )	2024-01-31 11:13:37 +01:00
Micha Reiser	3c7fea769c	Show source-type in formatter snapshot tests with options (#9699 )	2024-01-30 10:08:50 +00:00
Charlie Marsh	e80260a3c5	Remove source path from parser errors (#9322 ) ## Summary I always found it odd that we had to pass this in, since it's really higher-level context for the error. The awkwardness is further evidenced by the fact that we pass in fake values everywhere (even outside of tests). The source path isn't actually used to display the error; it's only accessed elsewhere to _re-display_ the error in certain cases. This PR modifies to instead pass the path directly in those cases.	2023-12-30 20:33:05 +00:00
Micha Reiser	d835b28d01	Show preview changes for tests with options (#9223 )	2023-12-21 23:36:19 +00:00
Micha Reiser	8cb7950102	Add `target_version` to formatter options (#9220 )	2023-12-21 04:05:58 +00:00
Andrew Gallant	b972455ac7	ruff_python_formatter: implement "dynamic" line width mode for docstring code formatting (#9098 ) ## Summary This PR changes the internal `docstring-code-line-width` setting to additionally accept a string value `dynamic`. When `dynamic` is set, the line width is dynamically adjusted when reformatting code snippets in docstrings based on the indent level of the docstring. The result is that the reformatted lines from the code snippet should not exceed the "global" line width configuration for the surrounding source. This PR does not change the default behavior, although I suspect the default should probably be `dynamic`. ## Test Plan I added a new configuration to the existing docstring code tests and also added a new set of tests dedicated to the new `dynamic` mode.	2023-12-12 09:58:07 -05:00
Andrew Gallant	07380e0657	ruff_python_formatter: add docstring-code-line-width internal setting (#9055 ) ## Summary This does the light plumbing necessary to add a new internal option that permits setting the line width of code examples in docstrings. The plan is to add the corresponding user facing knob in #8854. Note that this effectively removes the `same-as-global` configuration style discussed [in this comment](https://github.com/astral-sh/ruff/issues/8855#issuecomment-1847230440). It replaces it with the `{integer}` configuration style only. There are a lot of commits here, but they are each tiny to make review easier because of the changes to snapshots. ## Test Plan I added a new docstring test configuration that sets `docstring-code-line-width = 60` and examined the differences.	2023-12-11 08:20:59 -05:00
Micha Reiser	fd70cd789f	Update Black tests (#8901 )	2023-11-30 00:09:55 +00:00
Andrew Gallant	d9845a2628	format doctests in docstrings (#8811 ) ## Summary This PR adds opt-in support for formatting doctests in docstrings. This reflects initial support and it is intended to add support for Markdown and reStructuredText Python code blocks in the future. But I believe this PR lays the groundwork, and future additions for Markdown and reST should be less costly to add. It's strongly recommended to review this PR commit-by-commit. The last few commits in particular implement the bulk of the work here and represent the denser portions. Some things worth mentioning: * The formatter is itself not perfect, and it is possible for it to produce invalid Python code. Because of this, reformatted code snippets are checked for Python validity. If they aren't valid, then we (unfortunately silently) bail on formatting that code snippet. * There are a couple places where it would be nice to at least warn the user that doctest formatting failed, but it wasn't clear to me what the best way to do that is. * I haven't yet run this in anger on a real world code base. I think that should happen before merging. Closes #7146 ## Test Plan * [x] Pass the local test suite. * [x] Scrutinize ecosystem changes. * [x] Run this formatter on extant code and scrutinize the results. (e.g., CPython, numpy.)	2023-11-27 11:14:55 -05:00
Charlie Marsh	d574fcd1ac	Compare formatted and unformatted ASTs during formatter tests (#8624 ) ## Summary This PR implements validation in the formatter tests to ensure that we don't modify the AST during formatting. Black has similar logic. In implementing this, I learned that Black actually _does_ modify the AST, and their test infrastructure normalizes the AST to wipe away those differences. Specifically, Black changes the indentation of docstrings, which _does_ modify the AST; and it also inserts parentheses in `del` statements, which changes the AST too. Ruff also does both these things, so we _also_ implement the same normalization using a new visitor that allows for modifying the AST. Closes https://github.com/astral-sh/ruff/issues/8184. ## Test Plan `cargo test`	2023-11-13 17:43:27 +00:00
doolio	4fdf97a95c	Apply consistent code block labels (#8563 ) This ensures the python label is used for all python code blocks for consistency. ## Test Plan Visual inspection of all changes via git client ensuring no other changes were made in error.	2023-11-09 01:49:24 +00:00
konsti	317d3dd612	Add test and basic implementation for formatter preview mode (#8044 ) Summary Prepare for the black preview style becoming the black stable style at the end of the year. This adds a new test file to compare stable and preview on some relevant preview options in black, and makes `format_dev` understand the black preview flag. I've added poetry as a project that uses preview. I've implemented one specific deviation (collapsing of stub implementation in non-stub files) which showed up in poetry for testing. This also improves poetry compatibility from 0.99891 to 0.99919. Fixes #7440 New compatibility stats: \| project \| similarity index \| total files \| changed files \| \|----------------\|------------------:\|------------------:\|------------------:\| \| cpython \| 0.75803 \| 1799 \| 1647 \| \| django \| 0.99983 \| 2772 \| 35 \| \| home-assistant \| 0.99953 \| 10596 \| 189 \| \| poetry \| 0.99919 \| 317 \| 12 \| \| transformers \| 0.99963 \| 2657 \| 332 \| \| twine \| 1.00000 \| 33 \| 0 \| \| typeshed \| 0.99978 \| 3669 \| 20 \| \| warehouse \| 0.99969 \| 654 \| 15 \| \| zulip \| 0.99970 \| 1459 \| 22 \|	2023-10-26 15:33:26 +00:00
konsti	4d16e2308d	Formatter and parser refactoring (#7569 ) I got confused and refactored a bit, now the naming should be more consistent. This is the basis for the range formatting work. Chages: * `format_module` -> `format_module_source` (format a string) * `format_node` -> `format_module_ast` (format a program parsed into an AST) * Added `parse_ok_tokens` that takes `Token` instead of `Result<Token>` * Call the source code `source` consistently * Added a `tokens_and_ranges` helper * `python_ast` -> `module` (because that's the type)	2023-09-26 15:29:43 +02:00
konsti	4ae463d04b	Add a test for stmt assign breaking in preview mode (#7516 ) In preview mode, black will consistently break the right side first. This doesn't work yet, but we'll need the test later.	2023-09-19 16:16:40 +02:00
Micha Reiser	2d9b39871f	Introduce `IndentWidth` (#7301 )	2023-09-13 14:52:24 +02:00
Micha Reiser	9d77552e18	Add tab width option (#6848 )	2023-08-26 12:29:58 +02:00
Tom Kuson	84d178a219	Use one line between top-level items if formatting a stub file (#6501 ) Co-authored-by: Micha Reiser <micha@reiser.io>	2023-08-15 09:33:57 +02:00
konsti	1031bb6550	Formatter: Add SourceType to context to enable special formatting for stub files (#6331 ) Summary This adds the information whether we're in a .py python source file or in a .pyi stub file to enable people working on #5822 and related issues. I'm not completely happy with `Default` for something that depends on the input. Test Plan None, this is currently unused, i'm leaving this to first implementation of stub file specific formatting. --------- Co-authored-by: Micha Reiser <micha@reiser.io>	2023-08-04 11:52:26 +00:00
Micha Reiser	f1d367655b	Format `target: annotation = value?` expressions (#5661 )	2023-07-11 16:40:28 +02:00
Micha Reiser	089a671adb	Fix Black compatible snapshot deletion (#5646 )	2023-07-10 15:00:18 +02:00
Micha Reiser	ae25638b0b	Update Black tests (#5438 )	2023-06-30 06:32:50 +00:00
Micha Reiser	f18a1f70de	Add tests for skip magic trailing comma <!-- Thank you for contributing to Ruff! To help us out with reviewing, please consider the following: - Does this pull request include a summary of the change? (See below.) - Does this pull request include a descriptive title? - Does this pull request include references to any relevant issues? --> ## Summary This PR adds tests that verify that the magic trailing comma is not respected if disabled in the formatter options. Our test setup now allows to create a `<fixture-name>.options.json` file that contains an array of configurations that should be tested. <!-- What's the purpose of the change? What does it do, and why? --> ## Test Plan It's all about tests :) <!-- How was it tested? -->	2023-06-26 14:15:55 +02:00
Micha Reiser	dd0d1afb66	Create `PyFormatOptions` <!-- Thank you for contributing to Ruff! To help us out with reviewing, please consider the following: - Does this pull request include a summary of the change? (See below.) - Does this pull request include a descriptive title? - Does this pull request include references to any relevant issues? --> ## Summary This PR adds a new `PyFormatOptions` struct that stores the python formatter options. The new options aren't used yet, with the exception of magical trailing commas and the options passed to the printer. I'll follow up with more PRs that use the new options (e.g. `QuoteStyle`). <!-- What's the purpose of the change? What does it do, and why? --> ## Test Plan `cargo test` I'll follow up with a new PR that adds support for overriding the options in our fixture tests.	2023-06-26 14:02:17 +02:00
Micha Reiser	8879927b9a	Use `insta::glob` instead of `fixture` macro (#5364 )	2023-06-26 08:46:18 +00:00

29 commits