mirrors/ruff - Forgejo: Beyond coding. We Forge.

mirror of https://github.com/astral-sh/ruff.git synced 2025-07-12 07:35:07 +00:00

Author	SHA1	Message	Date
Dhruv Manilawala	4738e19974	Remove unused lexical error types (#11145 )	2024-04-25 15:24:16 +00:00
Dhruv Manilawala	38d2562f41	Refactor unary expression parsing (#11088 ) ## Summary This PR refactors unary expression parsing with the following changes: * Ability to get `OperatorPrecedence` from a unary operator (`UnaryOp`) * Implement methods on `TokenKind` * Add `as_unary_operator` which returns an `Option<UnaryOp>` * Add `as_unary_arithmetic_operator` which returns an `Option<UnaryOp>` (used for pattern parsing) * Rename `is_unary` to `is_unary_arithmetic_operator` (used in the linter) resolves: #10752 ## Test Plan Verify that the existing test cases pass, no ecosystem changes, run the Python based fuzzer on 3000 random inputs and run it on dozens of open-source repositories.	2024-04-23 04:55:02 +00:00
Dhruv Manilawala	7eba967e16	Refactor binary expression parsing (#11073 ) ## Summary This PR refactors the binary expression parsing in a way to make it readable and easy to understand. It draws inspiration from the suggested edits in the linked messages in #10752. ### Changes * Ability to get the precedence of an operator * From a boolean operator (`BinOp`) to `OperatorPrecedence` * From a binary operator (`Operator`) to `OperatorPrecedence` * No comparison operator because all of them have the same precedence * Implement methods on `TokenKind` to convert it to an appropriate operator enum * Add `as_boolean_operator` which returns an `Option<BoolOp>` * Add `as_binary_operator` which returns an `Option<Operator>` * No `as_comparison_operator` because it requires lookahead and I'm not sure if `token.as_comparison_operator(peek)` is a good way to implement it * Introduce `BinaryLikeOperator` * Constructed from two tokens using the methods from the second point * Add `precedence` method using the conversion methods mentioned in the first point * Make most of the functions in `TokenKind` private to the module * Use `self` instead of `&self` for `TokenKind` fixes: #11072 ## Test Plan Refer #11088	2024-04-23 04:42:40 +00:00
Dhruv Manilawala	5b81b8368d	Make associativity a property of operator precedence (#11065 ) ## Summary This PR does a few things but the main change is that is makes associativity a property of operator precedence. 1. Rename `Precedence` -> `OperatorPrecedence` 2. Rename `parse_expression_with_precedence` -> `parse_binary_expression_or_higher` 3. Move `current_binding_power` to `OperatorPrecedence::try_from_tokens` [^1] 4. Add a `OperatorPrecedence::is_right_associative` method 5. Move from `increment_precedence` to using `<=` / `<` to check if the parsing loop needs to stop [^2] [^1]: Another alternative would be to have two separate methods to avoid lookahead as it's required only for once case (`not in`). So, `try_from_current_token(current).or_else(\|\| try_from_next_token(current, peek))` [^2]: This will allow us to easily make the refactors mentioned in #10752 ## Test Plan Make sure the precedence parsing algorithm is still correct by running the test suite, fuzz testing it and running it against a dozen or so open-source repositories.	2024-04-23 04:28:46 +00:00
Dhruv Manilawala	c30735d4a7	Add `ExpressionContext` for expression parsing (#11055 ) ## Summary This PR adds a new `ExpressionContext` struct which is used in expression parsing. This solves the following problem: 1. Allowing starred expression with different precedence 2. Allowing yield expression in certain context 3. Remove ambiguity with `in` keyword when parsing a `for ... in` statement For context, (1) was solved by adding `parse_star_expression_list` and `parse_star_expression_or_higher` in #10623, (2) was solved by by adding `parse_yield_expression_or_else` in #10809, and (3) was fixed in #11009. All of the mentioned functions have been removed in favor of the context flags. As mentioned in #11009, an ideal solution would be to implement an expression context which is what this PR implements. This is passed around as function parameter and the call stack is used to automatically reset the context. ### Recovery How should the parser recover if the target expression is invalid when an expression can consume the `in` keyword? 1. Should the `in` keyword be part of the target expression? 2. Or, should the expression parsing stop as soon as `in` keyword is encountered, no matter the expression? For example: ```python for yield x in y: ... # Here, should this be parsed as for (yield x) in (y): ... # Or for (yield x in y): ... # where the `in iter` part is missing ``` Or, for binary expression parsing: ```python for x or y in z: ... # Should this be parsed as for (x or y) in z: ... # Or for (x or y in z): ... # where the `in iter` part is missing ``` This need not be solved now, but is very easy to change. For context this PR does the following: * For binary, comparison, and unary expressions, stop at `in` * For lambda, yield expressions, consume the `in` ## Test Plan 1. Add test cases for the `for ... in` statement and verify the snapshots 2. Make sure the existing test suite pass 3. Run the fuzzer for around 3000 generated source code 4. Run the updated logic on a dozen or so open source repositories (codename "parser-checkouts")	2024-04-23 04:19:05 +00:00
Carl Meyer	c80b9a4a90	Reduce size of Stmt from 144 to 120 bytes (#11051 ) ## Summary I happened to notice that we box `TypeParams` on `StmtClassDef` but not on `StmtFunctionDef` and wondered why, since `StmtFunctionDef` is bigger and sets the size of `Stmt`. @charliermarsh found that at the time we started boxing type params on classes, classes were the largest statement type (see #6275), but that's no longer true. So boxing type-params also on functions reduces the overall size of `Stmt`. ## Test Plan The `<=` size tests are a bit irritating (since their failure doesn't tell you the actual size), but I manually confirmed that the size is actually 120 now.	2024-04-19 17:02:17 -06:00
Dhruv Manilawala	d3cd61f804	Use empty range when there's "gap" in token source (#11032 ) ## Summary This fixes a bug where the parser would panic when there is a "gap" in the token source. What's a gap? The reason it's `<=` instead of just `==` is because there could be whitespaces between the two tokens. For example: ```python # last token end # \| current token (newline) start # v v def foo \n # ^ # assume there's trailing whitespace here ``` Or, there could tokens that are considered "trivia" and thus aren't emitted by the token source. These are comments and non-logical newlines. For example: ```python # last token end # v def foo # comment\n # ^ current token (newline) start ``` In either of the above cases, there's a "gap" between the end of the last token and start of the current token. ## Test Plan Add test cases and update the snapshots.	2024-04-19 11:36:26 +00:00
Dhruv Manilawala	9bb23b0a38	Expect indented case block instead of match stmt (#11033 ) ## Summary This PR adds a new `Clause::Case` and uses it to parse the body of a `case` block. Earlier, it was using `Match` which would give an incorrect error message like: ``` \| 1 \| match subject: 2 \| case 1: 3 \| case 2: ... \| ^^^^ Syntax Error: Expected an indented block after `match` statement \| ``` ## Test Plan Add test case and update the snapshot.	2024-04-19 16:46:15 +05:30
Dhruv Manilawala	b7066e64e7	Consider binary expr for parenthesized with items parsing (#11012 ) ## Summary This PR fixes the bug in with items parsing where it would fail to recognize that the parenthesized expression is part of a large binary expression. ## Test Plan Add test cases and verified the snapshots.	2024-04-18 21:39:30 +05:30
Dhruv Manilawala	6c4d779140	Consider `if` expression for parenthesized with items parsing (#11010 ) ## Summary This PR fixes the bug in parenthesized with items parsing where the `if` expression would result into a syntax error. The reason being that once we identify that the ambiguous left parenthesis belongs to the context expression, the parser converts the parsed with item into an equivalent expression. Then, the parser continuous to parse any postfix expressions. Now, attribute, subscript, and call are taken into account as they're grouped in `parse_postfix_expression` but `if` expression has it's own parsing function. Use `parse_if_expression` once all postfix expressions have been parsed. Ideally, I think that `if` could be included in postfix expression parsing as they can be chained as well (`x if True else y if True else z`). ## Test Plan Add test cases and verified the snapshots.	2024-04-18 14:30:15 +00:00
Dhruv Manilawala	8020d486f6	Reset `FOR_TARGET` context for all kinds of parentheses (#11009 ) ## Summary This PR fixes a bug in the new parser which involves the parser context w.r.t. for statement. This is specifically around the `in` keyword which can be present in the target expression and shouldn't be considered to be part of the `for` statement header. Ideally it should use a context which is passed between functions, thus using a call stack to set / unset a specific variant which will be done in a follow-up PR as it requires some amount of refactor. ## Test Plan Add test cases and update the snapshots.	2024-04-18 19:37:50 +05:30
Dhruv Manilawala	13ffb5bc19	Replace LALRPOP parser with hand-written parser (#10036 ) (Supersedes #9152, authored by @LaBatata101) ## Summary This PR replaces the current parser generated from LALRPOP to a hand-written recursive descent parser. It also updates the grammar for [PEP 646](https://peps.python.org/pep-0646/) so that the parser outputs the correct AST. For example, in `data[*x]`, the index expression is now a tuple with a single starred expression instead of just a starred expression. Beyond the performance improvements, the parser is also error resilient and can provide better error messages. The behavior as seen by any downstream tools isn't changed. That is, the linter and formatter can still assume that the parser will _stop_ at the first syntax error. This will be updated in the following months. For more details about the change here, refer to the PR corresponding to the individual commits and the release blog post. ## Test Plan Write _lots_ and _lots_ of tests for both valid and invalid syntax and verify the output. ## Acknowledgements - @MichaReiser for reviewing 100+ parser PRs and continuously providing guidance throughout the project - @LaBatata101 for initiating the transition to a hand-written parser in #9152 - @addisoncrump for implementing the fuzzer which helped [catch](https://github.com/astral-sh/ruff/pull/10903) [a](https://github.com/astral-sh/ruff/pull/10910) [lot](https://github.com/astral-sh/ruff/pull/10966) [of](https://github.com/astral-sh/ruff/pull/10896) [bugs](https://github.com/astral-sh/ruff/pull/10877) --------- Co-authored-by: Victor Hugo Gomes <labatata101@linuxmail.org> Co-authored-by: Micha Reiser <micha@reiser.io>	2024-04-18 17:57:39 +05:30
Boshen	d467aa78c2	Remove an unused dependency (#10747 ) ## Summary Continuation of #10475, I improved [`cargo shear`](https://github.com/Boshen/cargo-shear) even more. We can put this in CI once I test it a bit more, given that [ignoring false positives](https://github.com/Boshen/cargo-shear?tab=readme-ov-file#ignore-false-positives) has been implemented. ## Test Plan `cargo check --all-features --all-targets`	2024-04-03 09:57:19 +01:00
veryyet	c5ea4209bb	chore: remove repetitive words (#10502 )	2024-03-21 03:57:16 +00:00
Alex Waygood	7caf0d064a	Simplify formatting of strings by using flags from the AST nodes (#10489 )	2024-03-20 16:16:54 +00:00
Alex Waygood	ffd6e79677	Fix typo in `string_token_flags.rs` (#10476 )	2024-03-19 17:43:08 +00:00
Alex Waygood	162d2eb723	Track casing of r-string prefixes in the tokenizer and AST (#10314 ) Co-authored-by: Micha Reiser <micha@reiser.io>	2024-03-18 17:18:04 +00:00
Alex Waygood	92e6026446	Apply NFKC normalization to unicode identifiers in the lexer (#10412 )	2024-03-18 11:56:56 +00:00
Alex Waygood	c2e15f38ee	Unify enums used for internal representation of quoting style (#10383 )	2024-03-13 17:19:17 +00:00
Auguste Lalande	3ed707f245	Spellcheck & grammar (#10375 ) ## Summary I used `codespell` and `gramma` to identify mispellings and grammar errors throughout the codebase and fixed them. I tried not to make any controversial changes, but feel free to revert as you see fit.	2024-03-13 02:34:23 +00:00
Gautier Moin	a067d87ccc	Fix incorrect `Parameter` range for `args` and `kwargs` (#10283 ) ## Summary Fix #10282 This PR updates the Python grammar to include the `` character in `args` `kwargs` in the range of the `Parameter` ``` def f(args, *kwargs): pass # ~~~~ ~~~~~~ <-- range before the PR # ^^^^^ ^^^^^^^^ <-- range after ``` The invalid syntax `def f(, **kwargs): ...` is also now correctly reported. ## Test Plan Test cases were added to `function.rs`.	2024-03-08 18:57:49 -05:00
Alex Waygood	1d97f27335	Start tracking quoting style in the AST (#10298 ) This PR modifies our AST so that nodes for string literals, bytes literals and f-strings all retain the following information: - The quoting style used (double or single quotes) - Whether the string is triple-quoted or not - Whether the string is raw or not This PR is a followup to #10256. Like with that PR, this PR does not, in itself, fix any bugs. However, it means that we will have the necessary information to preserve quoting style and rawness of strings in the `ExprGenerator` in a followup PR, which will allow us to provide a fix for https://github.com/astral-sh/ruff/issues/7799. The information is recorded on the AST nodes using a bitflag field on each node, similarly to how we recorded the information on `Tok::String`, `Tok::FStringStart` and `Tok::FStringMiddle` tokens in #10298. Rather than reusing the bitflag I used for the tokens, however, I decided to create a custom bitflag for each AST node. Using different bitflags for each node allows us to make invalid states unrepresentable: it is valid to set a `u` prefix on a string literal, but not on a bytes literal or an f-string. It also allows us to have better debug representations for each AST node modified in this PR.	2024-03-08 19:11:47 +00:00
Alex Waygood	c504d7ab11	Track quoting style in the tokenizer (#10256 )	2024-03-08 08:40:06 +00:00
Micha Reiser	184241f99a	Remove `Expr` postfix from `ExprNamed`, `ExprIf`, and `ExprGenerator` (#10229 ) The expression types in our AST are called `ExprYield`, `ExprAwait`, `ExprStringLiteral` etc, except `ExprNamedExpr`, `ExprIfExpr` and `ExprGenratorExpr`. This seems to align with [Python AST's naming](https://docs.python.org/3/library/ast.html) but feels inconsistent and excessive. This PR removes the `Expr` postfix from `ExprNamedExpr`, `ExprIfExpr`, and `ExprGeneratorExpr`.	2024-03-04 12:55:01 +01:00
Micha Reiser	77c5561646	Add `parenthesized` flag to `ExprTuple` and `ExprGenerator` (#9614 )	2024-02-26 15:35:20 +00:00
Dhruv Manilawala	33ac2867b7	Use non-parenthesized range for `DebugText` (#9953 ) ## Summary This PR fixes the `DebugText` implementation to use the expression range instead of the parenthesized range. Taking the following code snippet as an example: ```python x = 1 print(f"{ ( x ) = }") ``` The output of running it would be: ``` ( x ) = 1 ``` Notice that the whitespace between the parentheses and the expression is preserved as is. Currently, we don't preserve this information in the AST which defeats the purpose of `DebugText` as the main purpose of the struct is to preserve whitespaces _around_ the expression. This is also problematic when generating the code from the AST node as then the generator has no information about the parentheses the whitespaces between them and the expression which would lead to the removal of the parentheses in the generated code. I noticed this while working on the f-string formatting where the debug text would be used to preserve the text surrounding the expression in the presence of debug expression. The parentheses were being dropped then which made me realize that the problem is instead in the parser. ## Test Plan 1. Add a test case for the parser 2. Add a test case for the generator	2024-02-12 23:00:02 +05:30
Charlie Marsh	6f0e4ad332	Remove unnecessary string cloning from the parser (#9884 ) Closes https://github.com/astral-sh/ruff/issues/9869.	2024-02-09 16:03:27 -05:00
Charlie Marsh	49fe1b85f2	Reduce size of `Expr` from 80 to 64 bytes (#9900 ) ## Summary This PR reduces the size of `Expr` from 80 to 64 bytes, by reducing the sizes of... - `ExprCall` from 72 to 56 bytes, by using boxed slices for `Arguments`. - `ExprCompare` from 64 to 48 bytes, by using boxed slices for its various vectors. In testing, the parser gets a bit faster, and the linter benchmarks improve quite a bit.	2024-02-09 02:53:13 +00:00
Micha Reiser	fe7d965334	Reduce `Result<Tok, LexicalError>` size by using `Box<str>` instead of `String` (#9885 )	2024-02-08 20:36:22 +00:00
Micha Reiser	688177ff6a	Use Rust 1.76 (#9897 )	2024-02-08 18:20:08 +00:00
Charlie Marsh	6fffde72e7	Use `memchr` for string lexing (#9888 ) ## Summary On `main`, string lexing consists of walking through the string character-by-character to search for the closing quote (with some nuance: we also need to skip escaped characters, and error if we see newlines in non-triple-quoted strings). This PR rewrites `lex_string` to instead use `memchr` to search for the closing quote, which is significantly faster. On my machine, at least, the `globals.py` benchmark (which contains a lot of docstrings) gets 40% faster... ```text lexer/numpy/globals.py time: [3.6410 µs 3.6496 µs 3.6585 µs] thrpt: [806.53 MiB/s 808.49 MiB/s 810.41 MiB/s] change: time: [-40.413% -40.185% -39.984%] (p = 0.00 < 0.05) thrpt: [+66.623% +67.181% +67.822%] Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild lexer/unicode/pypinyin.py time: [12.422 µs 12.445 µs 12.467 µs] thrpt: [337.03 MiB/s 337.65 MiB/s 338.27 MiB/s] change: time: [-9.4213% -9.1930% -8.9586%] (p = 0.00 < 0.05) thrpt: [+9.8401% +10.124% +10.401%] Performance has improved. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) high mild 2 (2.00%) high severe lexer/pydantic/types.py time: [107.45 µs 107.50 µs 107.56 µs] thrpt: [237.11 MiB/s 237.24 MiB/s 237.35 MiB/s] change: time: [-4.0108% -3.7005% -3.3787%] (p = 0.00 < 0.05) thrpt: [+3.4968% +3.8427% +4.1784%] Performance has improved. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) high mild 5 (5.00%) high severe lexer/numpy/ctypeslib.py time: [46.123 µs 46.165 µs 46.208 µs] thrpt: [360.36 MiB/s 360.69 MiB/s 361.01 MiB/s] change: time: [-19.313% -18.996% -18.710%] (p = 0.00 < 0.05) thrpt: [+23.016% +23.451% +23.935%] Performance has improved. Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) low mild 1 (1.00%) high mild 4 (4.00%) high severe lexer/large/dataset.py time: [231.07 µs 231.19 µs 231.33 µs] thrpt: [175.87 MiB/s 175.97 MiB/s 176.06 MiB/s] change: time: [-2.0437% -1.7663% -1.4922%] (p = 0.00 < 0.05) thrpt: [+1.5148% +1.7981% +2.0864%] Performance has improved. Found 10 outliers among 100 measurements (10.00%) 5 (5.00%) high mild 5 (5.00%) high severe ```	2024-02-08 17:23:06 +00:00
Seo Sanghyeon	df7fb95cbc	Index multiline f-strings (#9837 ) Fix #9777.	2024-02-05 21:25:33 -05:00
Micha Reiser	47ad7b4500	Approximate tokens len (#9546 )	2024-01-19 17:39:37 +01:00
Micha Reiser	21f2d0c90b	Add an explicit fast path for whitespace to `is_identifier_continuation` (#9532 )	2024-01-16 08:23:43 +00:00
Micha Reiser	f192c72596	Remove type parameter from `parse_*` methods (#9466 )	2024-01-11 19:41:19 +01:00
Micha Reiser	94968fedd5	Use Rust 1.75 toolchain (#9437 )	2024-01-08 18:03:16 +01:00
Micha Reiser	b1a5df8694	Move `locate_cmp_ops` to `invalid_literal_comparisons` (#9438 )	2024-01-08 13:15:36 +01:00
Charlie Marsh	1666c7a5cb	Add size hints to string parser (#9413 )	2024-01-06 15:59:34 -05:00
Charlie Marsh	f0d43dafcf	Ignore trailing quotes for unclosed l-brace errors (#9388 ) ## Summary Given: ```python F"{"ڤ ``` We try to locate the "unclosed left brace" error by subtracting the quote size from the lexer offset -- so we subtract 1 from the end of the source, which puts us in the middle of a Unicode character. I don't think we should try to adjust the offset in this way, since there can be content _after_ the quote. For example, with the advent of PEP 701, this string could reasonably be fixed as: ```python F"{"ڤ"}" ```` Closes https://github.com/astral-sh/ruff/issues/9379.	2024-01-04 05:00:55 +00:00
Charlie Marsh	9073220887	Make all dependencies workspace dependencies (#9333 ) ## Summary This PR modifies our `Cargo.toml` files to use workspace dependencies for _all_ dependencies, rather than the status quo of sporadically trying to use workspace dependencies for those dependencies that are used across multiple crates. I find the current situation more confusing and harder to manage, since we have a mix of workspace and crate-local dependencies, whereas this setup consistently uses the same approach for all dependencies.	2024-01-02 13:41:59 +00:00
Charlie Marsh	48e04cc2c8	Add row and column numbers to formatted parse errors (#9321 ) ## Summary We now render parse errors in the formatter identically to those in the linter, e.g.: ``` ❯ cargo run -p ruff_cli -- format foo.py error: Failed to parse foo.py:1:17: Unexpected token '=' ``` Closes https://github.com/astral-sh/ruff/issues/8338. Closes https://github.com/astral-sh/ruff/issues/9311.	2023-12-31 07:10:45 -05:00
Charlie Marsh	e80260a3c5	Remove source path from parser errors (#9322 ) ## Summary I always found it odd that we had to pass this in, since it's really higher-level context for the error. The awkwardness is further evidenced by the fact that we pass in fake values everywhere (even outside of tests). The source path isn't actually used to display the error; it's only accessed elsewhere to _re-display_ the error in certain cases. This PR modifies to instead pass the path directly in those cases.	2023-12-30 20:33:05 +00:00
Charlie Marsh	97e9d3c54f	Use `Display` for formatter parse errors (#9316 ) ## Summary This helps a bit with (but does not close) the issues described in https://github.com/astral-sh/ruff/issues/9311. E.g., now, we at least see: `error: Failed to format main.py: source contains syntax errors: invalid syntax. Got unexpected token '=' at byte offset 20`.	2023-12-29 22:26:57 +00:00
Charlie Marsh	9d6444138b	Remove lexing and parsing from the linter benchmark (#9264 ) ## Summary This PR adds some helper structs to the linter paths to enable passing in the pre-computed tokens and parsed source code during benchmarking, to remove lexing and parsing from the overall linter benchmark measurement. We already remove parsing for the formatter, and we have separate benchmarks for the lexer and the parser, so this should make it much easier to measure linter performance changes.	2023-12-23 16:43:11 -05:00
Andrew Gallant	3ce145c476	release: switch to Cargo's default (#9031 ) This sets `lto = "thin"` instead of using "fat" LTO, and sets `codegen-units = 16`. These are the defaults for Cargo's `release` profile, and I think it may give us faster iteration times, especially when benchmarking. The point of this PR is to see what kind of impact this has on benchmarks. It is expected that benchmarks may regress to some extent. I did some quick ad hoc experiments to quantify this change in compile times. Namely, I ran: cargo build --profile release -p ruff_cli Then I ran touch crates/ruff_python_formatter/src/expression/string/docstring.rs (because that's where i've been working lately) and re-ran cargo build --profile release -p ruff_cli This last command is what I timed, since it reflects how much time one has to wait between making a change and getting a compiled artifact. Here are my results: * With status quo `release` profile, build takes 77s * with `release` but `lto = "thin"`, build takes 41s * with `release`, but `lto = false`, build takes 19s * with `release`, but `lto = false` and `codegen-units = 16`, build takes 7s * with `release`, but `lto = "thin"` and `codegen-units = 16`, build takes 16s (i believe this is the default `release` configuration) This PR represents the last option. It's not the fastest to compile, but it's nearly a whole minute faster! The idea is that with `codegen-units = 16`, we still make use of parallelism, but keep _some_ level of LTO on to try and re-gain what we lose by increasing the number of codegen units.	2023-12-15 08:19:35 -05:00
Dhruv Manilawala	cdac90ef68	New AST nodes for f-string elements (#8835 ) Rebase of #6365 authored by @davidszotten. ## Summary This PR updates the AST structure for an f-string elements. The main motivation behind this change is to have a dedicated node for the string part of an f-string. Previously, the existing `ExprStringLiteral` node was used for this purpose which isn't exactly correct. The `ExprStringLiteral` node should include the quotes as well in the range but the f-string literal element doesn't include the quote as it's a specific part within an f-string. For example, ```python f"foo {x}" # ^^^^ # This is the literal part of an f-string ``` The introduction of `FStringElement` enum is helpful which represent either the literal part or the expression part of an f-string. ### Rule Updates This means that there'll be two nodes representing a string depending on the context. One for a normal string literal while the other is a string literal within an f-string. The AST checker is updated to accommodate this change. The rules which work on string literal are updated to check on the literal part of f-string as well. #### Notes 1. The `Expr::is_literal_expr` method would check for `ExprStringLiteral` and return true if so. But now that we don't represent the literal part of an f-string using that node, this improves the method's behavior and confines to the actual expression. We do have the `FStringElement::is_literal` method. 2. We avoid checking if we're in a f-string context before adding to `string_type_definitions` because the f-string literal is now a dedicated node and not part of `Expr`. 3. Annotations cannot use f-string so we avoid changing any rules which work on annotation and checks for `ExprStringLiteral`. ## Test Plan - All references of `Expr::StringLiteral` were checked to see if any of the rules require updating to account for the f-string literal element node. - New test cases are added for rules which check against the literal part of an f-string. - Check the ecosystem results and ensure it remains unchanged. ## Performance There's a performance penalty in the parser. The reason for this remains unknown as it seems that the generated assembly code is now different for the `__reduce154` function. The reduce function body is just popping the `ParenthesizedExpr` on top of the stack and pushing it with the new location. - The size of `FStringElement` enum is the same as `Expr` which is what it replaces in `FString::format_spec` - The size of `FStringExpressionElement` is the same as `ExprFormattedValue` which is what it replaces I tried reducing the `Expr` enum from 80 bytes to 72 bytes but it hardly resulted in any performance gain. The difference can be seen here: - Original profile: https://share.firefox.dev/3Taa7ES - Profile after boxing some node fields: https://share.firefox.dev/3GsNXpD ### Backtracking I tried backtracking the changes to see if any of the isolated change produced this regression. The problem here is that the overall change is so small that there's only a single checkpoint where I can backtrack and that checkpoint results in the same regression. This checkpoint is to revert using `Expr` to the `FString::format_spec` field. After this point, the change would revert back to the original implementation. ## Review process The review process is similar to #7927. The first set of commits update the node structure, parser, and related AST files. Then, further commits update the linter and formatter part to account for the AST change. --------- Co-authored-by: David Szotten <davidszotten@gmail.com>	2023-12-07 10:28:05 -06:00
Micha Reiser	7e390d3772	Move `ParenthesizedExpr` to `ruff_python_parser` (#8987 )	2023-12-04 05:36:28 +00:00
Charlie Marsh	20782ab02c	Support type alias statements in simple statement positions (#8916 ) <!-- Thank you for contributing to Ruff! To help us out with reviewing, please consider the following: - Does this pull request include a summary of the change? (See below.) - Does this pull request include a descriptive title? - Does this pull request include references to any relevant issues? --> ## Summary Our `SoftKeywordTokenizer` only respected soft keywords in compound statement positions -- for example, at the start of a logical line: ```python type X = int ``` However, type aliases can also appear in simple statement positions, like: ```python class Class: type X = int ``` (Note that `match` and `case` are _not_ valid keywords in such positions.) This PR upgrades the tokenizer to track both kinds of valid positions. Closes https://github.com/astral-sh/ruff/issues/8900. Closes https://github.com/astral-sh/ruff/issues/8899. ## Test Plan `cargo test`	2023-11-30 19:15:19 +00:00
Charlie Marsh	774c77adae	Avoid off-by-one error in with-item named expressions (#8915 ) ## Summary Given `with (a := b): pass`, we truncate the `WithItem` range by one on both sides such that the parentheses are part of the statement, rather than the item. However, for `with (a := b) as x: pass`, we want to avoid this trick. Closes https://github.com/astral-sh/ruff/issues/8913.	2023-11-30 00:11:04 +00:00
Charlie Marsh	6435e4e4aa	Enable auto-return-type involving `Optional` and `Union` annotations (#8885 ) ## Summary Previously, this was only supported for Python 3.10 and later, since we always use the PEP 604-style unions.	2023-11-28 18:35:55 -08:00

... 2 3 4 5 6

270 commits