language-servers/ruff - Forgejo: Beyond coding. We Forge.

mirror of https://github.com/astral-sh/ruff.git synced 2025-09-29 05:15:12 +00:00

Author	SHA1	Message	Date
Dhruv Manilawala	33ac2867b7	Use non-parenthesized range for `DebugText` (#9953 ) ## Summary This PR fixes the `DebugText` implementation to use the expression range instead of the parenthesized range. Taking the following code snippet as an example: ```python x = 1 print(f"{ ( x ) = }") ``` The output of running it would be: ``` ( x ) = 1 ``` Notice that the whitespace between the parentheses and the expression is preserved as is. Currently, we don't preserve this information in the AST which defeats the purpose of `DebugText` as the main purpose of the struct is to preserve whitespaces _around_ the expression. This is also problematic when generating the code from the AST node as then the generator has no information about the parentheses the whitespaces between them and the expression which would lead to the removal of the parentheses in the generated code. I noticed this while working on the f-string formatting where the debug text would be used to preserve the text surrounding the expression in the presence of debug expression. The parentheses were being dropped then which made me realize that the problem is instead in the parser. ## Test Plan 1. Add a test case for the parser 2. Add a test case for the generator	2024-02-12 23:00:02 +05:30
Seo Sanghyeon	df7fb95cbc	Index multiline f-strings (#9837 ) Fix #9777.	2024-02-05 21:25:33 -05:00
Dhruv Manilawala	cdac90ef68	New AST nodes for f-string elements (#8835 ) Rebase of #6365 authored by @davidszotten. ## Summary This PR updates the AST structure for an f-string elements. The main motivation behind this change is to have a dedicated node for the string part of an f-string. Previously, the existing `ExprStringLiteral` node was used for this purpose which isn't exactly correct. The `ExprStringLiteral` node should include the quotes as well in the range but the f-string literal element doesn't include the quote as it's a specific part within an f-string. For example, ```python f"foo {x}" # ^^^^ # This is the literal part of an f-string ``` The introduction of `FStringElement` enum is helpful which represent either the literal part or the expression part of an f-string. ### Rule Updates This means that there'll be two nodes representing a string depending on the context. One for a normal string literal while the other is a string literal within an f-string. The AST checker is updated to accommodate this change. The rules which work on string literal are updated to check on the literal part of f-string as well. #### Notes 1. The `Expr::is_literal_expr` method would check for `ExprStringLiteral` and return true if so. But now that we don't represent the literal part of an f-string using that node, this improves the method's behavior and confines to the actual expression. We do have the `FStringElement::is_literal` method. 2. We avoid checking if we're in a f-string context before adding to `string_type_definitions` because the f-string literal is now a dedicated node and not part of `Expr`. 3. Annotations cannot use f-string so we avoid changing any rules which work on annotation and checks for `ExprStringLiteral`. ## Test Plan - All references of `Expr::StringLiteral` were checked to see if any of the rules require updating to account for the f-string literal element node. - New test cases are added for rules which check against the literal part of an f-string. - Check the ecosystem results and ensure it remains unchanged. ## Performance There's a performance penalty in the parser. The reason for this remains unknown as it seems that the generated assembly code is now different for the `__reduce154` function. The reduce function body is just popping the `ParenthesizedExpr` on top of the stack and pushing it with the new location. - The size of `FStringElement` enum is the same as `Expr` which is what it replaces in `FString::format_spec` - The size of `FStringExpressionElement` is the same as `ExprFormattedValue` which is what it replaces I tried reducing the `Expr` enum from 80 bytes to 72 bytes but it hardly resulted in any performance gain. The difference can be seen here: - Original profile: https://share.firefox.dev/3Taa7ES - Profile after boxing some node fields: https://share.firefox.dev/3GsNXpD ### Backtracking I tried backtracking the changes to see if any of the isolated change produced this regression. The problem here is that the overall change is so small that there's only a single checkpoint where I can backtrack and that checkpoint results in the same regression. This checkpoint is to revert using `Expr` to the `FString::format_spec` field. After this point, the change would revert back to the original implementation. ## Review process The review process is similar to #7927. The first set of commits update the node structure, parser, and related AST files. Then, further commits update the linter and formatter part to account for the AST change. --------- Co-authored-by: David Szotten <davidszotten@gmail.com>	2023-12-07 10:28:05 -06:00
Charlie Marsh	20782ab02c	Support type alias statements in simple statement positions (#8916 ) <!-- Thank you for contributing to Ruff! To help us out with reviewing, please consider the following: - Does this pull request include a summary of the change? (See below.) - Does this pull request include a descriptive title? - Does this pull request include references to any relevant issues? --> ## Summary Our `SoftKeywordTokenizer` only respected soft keywords in compound statement positions -- for example, at the start of a logical line: ```python type X = int ``` However, type aliases can also appear in simple statement positions, like: ```python class Class: type X = int ``` (Note that `match` and `case` are _not_ valid keywords in such positions.) This PR upgrades the tokenizer to track both kinds of valid positions. Closes https://github.com/astral-sh/ruff/issues/8900. Closes https://github.com/astral-sh/ruff/issues/8899. ## Test Plan `cargo test`	2023-11-30 19:15:19 +00:00
Charlie Marsh	774c77adae	Avoid off-by-one error in with-item named expressions (#8915 ) ## Summary Given `with (a := b): pass`, we truncate the `WithItem` range by one on both sides such that the parentheses are part of the statement, rather than the item. However, for `with (a := b) as x: pass`, we want to avoid this trick. Closes https://github.com/astral-sh/ruff/issues/8913.	2023-11-30 00:11:04 +00:00
Dhruv Manilawala	47d80f29a7	Lexer start of line is false only for `Mode::Expression` (#8880 ) ## Summary This PR fixes the bug in the lexer where the `Mode::Ipython` wasn't being considered when initializing the soft keyword transformer which wraps the lexer. This means that if the source code starts with either `match` or `type` keyword, then the keywords were being considered as name tokens instead. For example, ```python match foo: case bar: pass ``` This would transform the `match` keyword into an identifier if the mode is `Ipython`. The fix is to reverse the condition in the soft keyword initializer so that any new modes are by default considered as the lexer being at start of line. ## Test Plan Add a new test case for `Mode::Ipython` and verify the snapshot. fixes: #8870	2023-11-28 20:38:25 +00:00
Dhruv Manilawala	017e829115	Update string nodes for implicit concatenation (#7927 ) ## Summary This PR updates the string nodes (`ExprStringLiteral`, `ExprBytesLiteral`, and `ExprFString`) to account for implicit string concatenation. ### Motivation In Python, implicit string concatenation are joined while parsing because the interpreter doesn't require the information for each part. While that's feasible for an interpreter, it falls short for a static analysis tool where having such information is more useful. Currently, various parts of the code uses the lexer to get the individual string parts. One of the main challenge this solves is that of string formatting. Currently, the formatter relies on the lexer to get the individual string parts, and formats them including the comments accordingly. But, with PEP 701, f-string can also contain comments. Without this change, it becomes very difficult to add support for f-string formatting. ### Implementation The initial proposal was made in this discussion: https://github.com/astral-sh/ruff/discussions/6183#discussioncomment-6591993. There were various AST designs which were explored for this task which are available in the linked internal document[^1]. The selected variant was the one where the nodes were kept as it is except that the `implicit_concatenated` field was removed and instead a new struct was added to the `Expr*` struct. This would be a private struct would contain the actual implementation of how the AST is designed for both single and implicitly concatenated strings. This implementation is achieved through an enum with two variants: `Single` and `Concatenated` to avoid allocating a vector even for single strings. There are various public methods available on the value struct to query certain information regarding the node. The nodes are structured in the following way: ``` ExprStringLiteral - "foo" "bar" \|- StringLiteral - "foo" \|- StringLiteral - "bar" ExprBytesLiteral - b"foo" b"bar" \|- BytesLiteral - b"foo" \|- BytesLiteral - b"bar" ExprFString - "foo" f"bar {x}" \|- FStringPart::Literal - "foo" \|- FStringPart::FString - f"bar {x}" \|- StringLiteral - "bar " \|- FormattedValue - "x" ``` [^1]: Internal document: https://www.notion.so/astral-sh/Implicit-String-Concatenation-e036345dc48943f89e416c087bf6f6d9?pvs=4 #### Visitor The way the nodes are structured is that the entire string, including all the parts that are implicitly concatenation, is a single node containing individual nodes for the parts. The previous section has a representation of that tree for all the string nodes. This means that new visitor methods are added to visit the individual parts of string, bytes, and f-strings for `Visitor`, `PreorderVisitor`, and `Transformer`. ## Test Plan - `cargo insta test --workspace --all-features --unreferenced reject` - Verify that the ecosystem results are unchanged	2023-11-24 17:55:41 -06:00
Andrew Gallant	6a1fa4778f	Reject more syntactically invalid Python programs (#8524 ) ## Summary This commit adds some additional error checking to the parser such that assignments that are invalid syntax are rejected. This covers the obvious cases like `5 = 3` and some not so obvious cases like `x + y = 42`. This does add an additional recursive call to the parser for the cases handling assignments. I had initially been concerned about doing this, but `set_context` is already doing recursion during assignments, so I didn't feel as though this was changing any fundamental performance characteristics of the parser. (Also, in practice, I would expect any such recursion here to be quite shallow since the recursion is done on the target of an assignment. Such things are rarely nested much in practice.) Fixes #6895 ## Test Plan I've added unit tests covering every case that is detected as invalid on an `Expr`.	2023-11-07 07:16:06 -05:00
konsti	daea870c3c	Fix panic with 8 in octal escape (#8356 ) Summary The digits for an octal escape are 0 to 7, not 0 to 8, fixing the panic in #8355 Test plan Regression test parser fixture	2023-10-30 14:42:15 +01:00
Dhruv Manilawala	230c9ce236	Split `Constant` to individual literal nodes (#8064 ) ## Summary This PR splits the `Constant` enum as individual literal nodes. It introduces the following new nodes for each variant: * `ExprStringLiteral` * `ExprBytesLiteral` * `ExprNumberLiteral` * `ExprBooleanLiteral` * `ExprNoneLiteral` * `ExprEllipsisLiteral` The main motivation behind this refactor is to introduce the new AST node for implicit string concatenation in the coming PR. The elements of that node will be either a string literal, bytes literal or a f-string which can be implemented using an enum. This means that a string or bytes literal cannot be represented by `Constant::Str` / `Constant::Bytes` which creates an inconsistency. This PR avoids that inconsistency by splitting the constant nodes into it's own literal nodes, literal being the more appropriate naming convention from a static analysis tool perspective. This also makes working with literals in the linter and formatter much more ergonomic like, for example, if one would want to check if this is a string literal, it can be done easily using `Expr::is_string_literal_expr` or matching against `Expr::StringLiteral` as oppose to matching against the `ExprConstant` and enum `Constant`. A few AST helper methods can be simplified as well which will be done in a follow-up PR. This introduces a new `Expr::is_literal_expr` method which is the same as `Expr::is_constant_expr`. There are also intermediary changes related to implicit string concatenation which are quiet less. This is done so as to avoid having a huge PR which this already is. ## Test Plan 1. Verify and update all of the existing snapshots (parser, visitor) 2. Verify that the ecosystem check output remains unchanged for both the linter and formatter ### Formatter ecosystem check #### `main` \| project \| similarity index \| total files \| changed files \| \|----------------\|------------------:\|------------------:\|------------------:\| \| cpython \| 0.75803 \| 1799 \| 1647 \| \| django \| 0.99983 \| 2772 \| 34 \| \| home-assistant \| 0.99953 \| 10596 \| 186 \| \| poetry \| 0.99891 \| 317 \| 17 \| \| transformers \| 0.99966 \| 2657 \| 330 \| \| twine \| 1.00000 \| 33 \| 0 \| \| typeshed \| 0.99978 \| 3669 \| 20 \| \| warehouse \| 0.99977 \| 654 \| 13 \| \| zulip \| 0.99970 \| 1459 \| 22 \| #### `dhruv/constant-to-literal` \| project \| similarity index \| total files \| changed files \| \|----------------\|------------------:\|------------------:\|------------------:\| \| cpython \| 0.75803 \| 1799 \| 1647 \| \| django \| 0.99983 \| 2772 \| 34 \| \| home-assistant \| 0.99953 \| 10596 \| 186 \| \| poetry \| 0.99891 \| 317 \| 17 \| \| transformers \| 0.99966 \| 2657 \| 330 \| \| twine \| 1.00000 \| 33 \| 0 \| \| typeshed \| 0.99978 \| 3669 \| 20 \| \| warehouse \| 0.99977 \| 654 \| 13 \| \| zulip \| 0.99970 \| 1459 \| 22 \|	2023-10-30 12:13:23 +05:30
Dhruv Manilawala	78bbf6d403	New `Singleton` enum for `PatternMatchSingleton` node (#8063 ) ## Summary This PR adds a new `Singleton` enum for the `PatternMatchSingleton` node. Earlier the node was using the `Constant` enum but the value for this pattern can only be either `None`, `True` or `False`. With the coming PR to remove the `Constant`, this node required a new type to fill in. This also has the benefit of narrowing the type down to only the possible values for the node as evident by the removal of `unreachable`. ## Test Plan Update the AST snapshots and run `cargo test`.	2023-10-30 05:48:53 +00:00
Charlie Marsh	d6a4283003	Fix range of unparenthesized tuple subject in match statement (#8101 ) ## Summary This was just a bug in the parser ranges, probably since it was initially implemented. Given `match n % 3, n % 5: ...`, the "subject" (i.e., the tuple of two binary operators) was using the entire range of the `match` statement. Closes https://github.com/astral-sh/ruff/issues/8091. ## Test Plan `cargo test`	2023-10-22 19:58:33 -04:00
Dhruv Manilawala	43883b7a15	Disallow f-strings in match pattern literal (#7857 ) ## Summary This PR fixes a bug to disallow f-strings in match pattern literal. ``` literal_pattern ::= signed_number \| signed_number "+" NUMBER \| signed_number "-" NUMBER \| strings \| "None" \| "True" \| "False" \| signed_number: NUMBER \| "-" NUMBER ``` Source: https://docs.python.org/3/reference/compound_stmts.html#grammar-token-python-grammar-literal_pattern Also, ```console $ python /tmp/t.py File "/tmp/t.py", line 4 case "hello " f"{name}": ^^^^^^^^^^^^^^^^^^ SyntaxError: patterns may only match literals and attribute lookups ``` ## Test Plan Update existing test case and accordingly the snapshots. Also, add a new test case to verify that the parser does raise an error.	2023-10-09 10:11:08 +00:00
Dhruv Manilawala	709abd534a	Fix lexing single-quoted f-string with multi-line format spec (#7787 ) ## Summary Reported at https://github.com/python/cpython/issues/110259 ## Test Plan Add test cases for the fix and update the snapshots	2023-10-05 23:12:09 +05:30
Dhruv Manilawala	69b8136463	Avoid curly brace escape in f-string format spec (#7780 ) ## Summary This PR fixes a bug in the lexer for f-string format spec where it would consider the `{{` (double curly braces) as an escape pattern. This is not the case as evident by the [PEP](https://peps.python.org/pep-0701/#how-to-produce-these-new-tokens) as well but I missed the part: > [..] > * If in “format specifier mode” (see step 3), an opening brace ({) or a closing brace (}). > * If not in “format specifier mode” (see step 3), an opening brace ({) or a closing brace (}) that is not immediately followed by another opening/closing brace. ## Test Plan Add a test case to verify the fix and update the snapshot. fixes: #7778	2023-10-03 19:38:03 +05:30
konsti	3ccd1d580d	Use crates.io unicode_names2 0.6.0 (#6478 ) Update `unicode_names2` to the crates.io release 0.6.0, removing a git dependency.	2023-10-02 18:17:38 -04:00
Dhruv Manilawala	e91ffe3e93	Consume the escaped Windows newline (`\r\n`) for `FStringMiddle` (#7722 ) ## Summary This PR fixes a bug where if a Windows newline (`\r\n`) character was escaped, then only the `\r` was consumed and not `\n` leading to an unterminated string error. ## Test Plan Add new test cases to check the newline escapes. fixes: #7632	2023-10-01 07:58:20 +05:30
Dhruv Manilawala	e72d617f4b	Remove escaped mac/windows eol from AST string value (#7724 ) ## Summary This PR fixes the bug where the value of a string node type includes the escaped mac/windows newline character. Note that the token value still includes them, it's only removed when parsing the string content. ## Test Plan Add new test cases for the string node type to check that the escapes aren't being included in the string value. fixes: #7723	2023-10-01 07:37:59 +05:30
Dhruv Manilawala	e62e245c61	Add support for PEP 701 (#7376 ) ## Summary This PR adds support for PEP 701 in Ruff. This is a rollup PR of all the other individual PRs. The separate PRs were created for logic separation and code reviews. Refer to each pull request for a detail description on the change. Refer to the PR description for the list of pull requests within this PR. ## Test Plan ### Formatter ecosystem checks Explanation for the change in ecosystem check: https://github.com/astral-sh/ruff/pull/7597#issue-1908878183 #### `main` ``` \| project \| similarity index \| total files \| changed files \| \|--------------\|------------------:\|------------------:\|------------------:\| \| cpython \| 0.76083 \| 1789 \| 1631 \| \| django \| 0.99983 \| 2760 \| 36 \| \| transformers \| 0.99963 \| 2587 \| 319 \| \| twine \| 1.00000 \| 33 \| 0 \| \| typeshed \| 0.99983 \| 3496 \| 18 \| \| warehouse \| 0.99967 \| 648 \| 15 \| \| zulip \| 0.99972 \| 1437 \| 21 \| ``` #### `dhruv/pep-701` ``` \| project \| similarity index \| total files \| changed files \| \|--------------\|------------------:\|------------------:\|------------------:\| \| cpython \| 0.76051 \| 1789 \| 1632 \| \| django \| 0.99983 \| 2760 \| 36 \| \| transformers \| 0.99963 \| 2587 \| 319 \| \| twine \| 1.00000 \| 33 \| 0 \| \| typeshed \| 0.99983 \| 3496 \| 18 \| \| warehouse \| 0.99967 \| 648 \| 15 \| \| zulip \| 0.99972 \| 1437 \| 21 \| ```	2023-09-29 02:55:39 +00:00
Charlie Marsh	f45281345d	Include radix base prefix in large number representation (#7700 ) ## Summary When lexing a number like `0x995DC9BBDF1939FA` that exceeds our small number representation, we were only storing the portion after the base (in this case, `995DC9BBDF1939FA`). When using that representation in code generation, this could lead to invalid syntax, since `995DC9BBDF1939FA)` on its own is not a valid integer. This PR modifies the code to store the full span, including the radix prefix. See: https://github.com/astral-sh/ruff/issues/7455#issuecomment-1739802958. ## Test Plan `cargo test`	2023-09-28 20:38:06 +00:00
Charlie Marsh	93b5d8a0fb	Implement our own small-integer optimization (#7584 ) ## Summary This is a follow-up to #7469 that attempts to achieve similar gains, but without introducing malachite. Instead, this PR removes the `BigInt` type altogether, instead opting for a simple enum that allows us to store small integers directly and only allocate for values greater than `i64`: ```rust /// A Python integer literal. Represents both small (fits in an `i64`) and large integers. #[derive(Clone, PartialEq, Eq, Hash)] pub struct Int(Number); #[derive(Debug, Clone, PartialEq, Eq, Hash)] pub enum Number { /// A "small" number that can be represented as an `i64`. Small(i64), /// A "large" number that cannot be represented as an `i64`. Big(Box<str>), } impl std::fmt::Display for Number { fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { match self { Number::Small(value) => write!(f, "{value}"), Number::Big(value) => write!(f, "{value}"), } } } ``` We typically don't care about numbers greater than `isize` -- our only uses are comparisons against small constants (like `1`, `2`, `3`, etc.), so there's no real loss of information, except in one or two rules where we're now a little more conservative (with the worst-case being that we don't flag, e.g., an `itertools.pairwise` that uses an extremely large value for the slice start constant). For simplicity, a few diagnostics now show a dedicated message when they see integers that are out of the supported range (e.g., `outdated-version-block`). An additional benefit here is that we get to remove a few dependencies, especially `num-bigint`. ## Test Plan `cargo test`	2023-09-25 15:13:21 +00:00
Micha Reiser	8ce138760a	Emit `LexError` for dedent to incorrect level (#7638 )	2023-09-25 11:45:44 +01:00
Dhruv Manilawala	a41bb2733f	Add range to lexer test snapshots (#7265 ) ## Summary This PR updates the lexer test snapshots to include the range value as well. This is mainly a mechanical refactor. ### Motivation The main motivation is so that we can verify that the ranges are valid and do not overlap. ## Test Plan `cargo test`	2023-09-11 19:12:46 +00:00
Dhruv Manilawala	f5701fcc63	Use snapshots for remaining lexer tests (#7264 ) ## Summary This PR updates the remaining lexer test cases to use the snapshots. This is mainly a mechanical refactor. ## Motivation The main motivation is so that when we add the token range values to the test case output, it's easier to update the test cases. The reason they were not using the snapshots before was because of the usage of `test_case` macro. The macros is mainly used for different EOL test cases. If we just generate the snapshots directly, then the snapshot name would be suffixed with `-1`, `-2`, etc. as the test function is still the same. So, we'll create the snapshot ourselves with the platform name for the respective EOL test cases. ## Test Plan `cargo test`	2023-09-12 00:16:38 +05:30
Dhruv Manilawala	04f2842e4f	Move `ExprConstant::kind` to `StringConstant::unicode` (#7180 )	2023-09-06 07:39:25 +00:00
Charlie Marsh	68f605e80a	Fix `WithItem` ranges for parenthesized, non-`as` items (#6782 ) ## Summary This PR attempts to address a problem in the parser related to the range's of `WithItem` nodes in certain contexts -- specifically, `WithItem` nodes in parentheses that do not have an `as` token after them. For example, [here](https://play.ruff.rs/71be2d0b-2a04-4c7e-9082-e72bff152679): ```python with (a, b): pass ``` The range of the `WithItem` `a` is set to the range of `(a, b)`, as is the range of the `WithItem` `b`. In other words, when we have this kind of sequence, we use the range of the entire parenthesized context, rather than the ranges of the items themselves. Note that this also applies to cases [like](https://play.ruff.rs/c551e8e9-c3db-4b74-8cc6-7c4e3bf3713a): ```python with (a, b, c as d): pass ``` You can see the issue in the parser here: ```rust #[inline] WithItemsNoAs: Vec<ast::WithItem> = { <location:@L> <all:OneOrMore<Test<"all">>> <end_location:@R> => { all.into_iter().map(\|context_expr\| ast::WithItem { context_expr, optional_vars: None, range: (location..end_location).into() }).collect() }, } ``` Fixing this issue is... very tricky. The naive approach is to use the range of the `context_expr` as the range for the `WithItem`, but that range will be incorrect when the `context_expr` is itself parenthesized. For example, _that_ solution would fail here, since the range of the first `WithItem` would be that of `a`, rather than `(a)`: ```python with ((a), b): pass ``` The `with` parsing in general is highly precarious due to ambiguities in the grammar. Changing it in _any_ way seems to lead to an ambiguous grammar that LALRPOP fails to translate. Consensus seems to be that we don't really understand _why_ the current grammar works (i.e., _how_ it avoids these ambiguities as-is). The solution implemented here is to avoid changing the grammar itself, and instead change the shape of the nodes returned by various rules in the grammar. Specifically, everywhere that we return `Expr`, we instead return `ParenthesizedExpr`, which includes a parenthesized range and the underlying `Expr` itself. (If an `Expr` isn't parenthesized, the ranges will be equivalent.) In `WithItemsNoAs`, we can then use the parenthesized range as the range for the `WithItem`.	2023-08-31 16:21:29 +01:00
Charlie Marsh	15b73bdb8a	Introduce AST nodes for `PatternMatchClass` arguments (#6881 ) ## Summary This PR introduces two new AST nodes to improve the representation of `PatternMatchClass`. As a reminder, `PatternMatchClass` looks like this: ```python case Point2D(0, 0, x=1, y=2): ... ``` Historically, this was represented as a vector of patterns (for the `0, 0` portion) and parallel vectors of keyword names (for `x` and `y`) and values (for `1` and `2`). This introduces a bunch of challenges for the formatter, but importantly, it's also really different from how we represent similar nodes, like arguments (`func(0, 0, x=1, y=2)`) or parameters (`def func(x, y)`). So, firstly, we now use a single node (`PatternArguments`) for the entire parenthesized region, making it much more consistent with our other nodes. So, above, `PatternArguments` would be `(0, 0, x=1, y=2)`. Secondly, we now have a `PatternKeyword` node for `x=1` and `y=2`. This is much more similar to the how `Keyword` is represented within `Arguments` for call expressions. Closes https://github.com/astral-sh/ruff/issues/6866. Closes https://github.com/astral-sh/ruff/issues/6880.	2023-08-26 14:45:44 +00:00
Dhruv Manilawala	fb7caf43c8	Update lexer tests to use snapshots (#6658 ) ## Summary This PR updates the lexer tests to use the snapshot testing framework. It also makes the following changes: * Remove the use of macros in the lexer tests * Use `test_case` for EOL tests ## Test Plan ``` cargo test --package ruff_python_parser --lib --all-features -- lexer::tests --no-capture ```	2023-08-22 18:23:19 +00:00
Charlie Marsh	6a5acde226	Make `Parameters` an optional field on `ExprLambda` (#6669 ) ## Summary If a lambda doesn't contain any parameters, or any parameter _tokens_ (like `*`), we can use `None` for the parameters. This feels like a better representation to me, since, e.g., what should the `TextRange` be for a non-existent set of parameters? It also allows us to remove several sites where we check if the `Parameters` is empty by seeing if it contains any arguments, so semantically, we're already trying to detect and model around this elsewhere. Changing this also fixes a number of issues with dangling comments in parameter-less lambdas, since those comments are now automatically marked as dangling on the lambda. (As-is, we were also doing something not-great whereby the lambda was responsible for formatting dangling comments on the parameters, which has been removed.) Closes https://github.com/astral-sh/ruff/issues/6646. Closes https://github.com/astral-sh/ruff/issues/6647. ## Test Plan `cargo test`	2023-08-18 15:34:54 +00:00
Charlie Marsh	a70807e1e1	Expand `NamedExpr` range to include full range of parenthesized value (#6632 ) ## Summary Given: ```python if ( x := ( # 4 y # 5 ) # 6 ): pass ``` It turns out the parser ended the range of the `NamedExpr` at the end of `y`, rather than the end of the parenthesis that encloses `y`. This just seems like a bug -- the range should be from the start of the name on the left, to the end of the parenthesized node on the right. ## Test Plan `cargo test`	2023-08-17 14:34:05 +00:00
Charlie Marsh	96d310fbab	Remove `Stmt::TryStar` (#6566 ) ## Summary Instead, we set an `is_star` flag on `Stmt::Try`. This is similar to the pattern we've migrated towards for `Stmt::For` (removing `Stmt::AsyncFor`) and friends. While these are significant differences for an interpreter, we tend to handle these cases identically or nearly identically. ## Test Plan `cargo test`	2023-08-14 13:39:44 -04:00
Charlie Marsh	f16e780e0a	Add an implicit concatenation flag to string and bytes constants (#6512 ) ## Summary Per the discussion in https://github.com/astral-sh/ruff/discussions/6183, this PR adds an `implicit_concatenated` flag to the string and bytes constant variants. It's not actually _used_ anywhere as of this PR, but it is covered by the tests. Specifically, we now use a struct for the string and bytes cases, along with the `Expr::FString` node. That struct holds the value, plus the flag: ```rust #[derive(Clone, Debug, PartialEq, is_macro::Is)] pub enum Constant { Str(StringConstant), Bytes(BytesConstant), ... } #[derive(Clone, Debug, PartialEq, Eq)] pub struct StringConstant { /// The string value as resolved by the parser (i.e., without quotes, or escape sequences, or /// implicit concatenations). pub value: String, /// Whether the string contains multiple string tokens that were implicitly concatenated. pub implicit_concatenated: bool, } impl Deref for StringConstant { type Target = str; fn deref(&self) -> &Self::Target { self.value.as_str() } } #[derive(Clone, Debug, PartialEq, Eq)] pub struct BytesConstant { /// The bytes value as resolved by the parser (i.e., without quotes, or escape sequences, or /// implicit concatenations). pub value: Vec<u8>, /// Whether the string contains multiple string tokens that were implicitly concatenated. pub implicit_concatenated: bool, } impl Deref for BytesConstant { type Target = [u8]; fn deref(&self) -> &Self::Target { self.value.as_slice() } } ``` ## Test Plan `cargo test`	2023-08-14 13:46:54 +00:00
Dhruv Manilawala	6a64f2289b	Rename `Magic` to `IpyEscape` (#6395 ) ## Summary This PR renames the `MagicCommand` token to `IpyEscapeCommand` token and `MagicKind` to `IpyEscapeKind` type to better reflect the purpose of the token and type. Similarly, it renames the AST nodes from `LineMagic` to `IpyEscapeCommand` prefixed with `Stmt`/`Expr` wherever necessary. It also makes renames from using `jupyter_magic` to `ipython_escape_commands` in various function names. The mode value is still `Mode::Jupyter` because the escape commands are part of the IPython syntax but the lexing/parsing is done for a Jupyter notebook. ### Motivation behind the rename: * IPython codebase defines it as "EscapeCommand" / "Escape Sequences": * Escape Sequences: `292e3a2345/IPython/core/inputtransformer2.py (L329-L333)` * Escape command: `292e3a2345/IPython/core/inputtransformer2.py (L410-L411)` * The word "magic" is used mainly for the actual magic commands i.e., the ones starting with `%`/`%%` (https://ipython.readthedocs.io/en/stable/interactive/reference.html#magic-command-system). So, this avoids any confusion between the Magic token (`%`, `%%`) and the escape command itself. ## Test Plan * `cargo test` to make sure all renames are done correctly. * `grep` for `jupyter_escape`/`magic` to make sure all renames are done correctly.	2023-08-09 13:28:18 +00:00
Dhruv Manilawala	e257c5af32	Add support for help end IPython escape commands (#6358 ) ## Summary This PR adds support for a stricter version of help end escape commands[^1] in the parser. By stricter, I mean that the escape tokens are only at the end of the command and there are no tokens at the start. This makes it difficult to implement it in the lexer without having to do a lot of look aheads or keeping track of previous tokens. Now, as we're adding this in the parser, the lexer needs to recognize and emit a new token for `?`. So, `Question` token is added which will be recognized only in `Jupyter` mode. The conditions applied are the same as the ones in the original implementation in IPython codebase (which is a regex): * There can only be either 1 or 2 question mark(s) at the end * The node before the question mark can be a `Name`, `Attribute`, `Subscript` (only with integer constants in slice position), or any combination of the 3 nodes. ## Test Plan Added test cases for various combination of the possible nodes in the command value position and update the snapshots. fixes: #6359 fixes: #5030 (This is the final piece) [^1]: https://github.com/astral-sh/ruff/pull/6272#issue-1833094281	2023-08-09 10:28:52 +05:30
Charlie Marsh	3f0eea6d87	Rename `JoinedStr` to `FString` in the AST (#6379 ) ## Summary Per the proposal in https://github.com/astral-sh/ruff/discussions/6183, this PR renames the `JoinedStr` node to `FString`.	2023-08-07 17:33:17 +00:00
Charlie Marsh	daefa74e9a	Remove async AST node variants for `with`, `for`, and `def` (#6369 ) ## Summary Per the suggestion in https://github.com/astral-sh/ruff/discussions/6183, this PR removes `AsyncWith`, `AsyncFor`, and `AsyncFunctionDef`, replacing them with an `is_async` field on the non-async variants of those structs. Unlike an interpreter, we _generally_ have identical handling for these nodes, so separating them into distinct variants adds complexity from which we don't really benefit. This can be seen below, where we get to remove a _ton_ of code related to adding generic `Any*` wrappers, and a ton of duplicate branches for these cases. ## Test Plan `cargo test` is unchanged, apart from parser snapshots.	2023-08-07 16:36:02 +00:00
Dhruv Manilawala	e4a4660925	Support help end escape command with priority (#6272 ) ## Summary This PR adds support for help end escape command in the lexer. ### What are "help end escape commands"? First, the escape commands are special IPython syntax which enhances the functionality for the IPython REPL. There are 9 types of escape kinds which are recognized by the tokens which are present at the start of the command (`?`, `??`, `!`, `!!`, etc.). Here, the help command is using either the `?` or `??` token at the start (`?str.replace` for example). Those 2 tokens are also supported when they're at the end of the command (`str.replace?`), but the other tokens aren't supported in that position. There are mainly two types of help end escape commands: 1. Ending with either `?` or `??`, but it also starts with one of the escape tokens (`%matplotlib?`) 2. On the other hand, there's a stricter version for (1) which doesn't start with any escape tokens (`str.replace?`) This PR adds support for (1) while (2) will be supported in the parser. ### Priority Now, if the command starts and ends with an escape token, how do we decide the kind of this command? This is where priority comes into picture. This is simple as there's only one priority where `?`/`??` at the end takes priority over any other escape token and all of the other tokens are at the same priority. Remember that only `?`/`??` at the end is considered valid. This is mainly useful in the case where someone would want to invoke the help command on the magic command itself. For example, in `%matplotlib?` the help command takes priority which means that we want help for the `matplotlib` magic function instead of calling the magic function itself. ### Specification Here's where things get a bit tricky. What if there are question mark tokens at both ends. How do we decide if it's `Help` (`?`) kind or `Help2` (`??`) kind? \| \| Magic \| Value \| Kind \| \| --- \| --- \| --- \| --- \| \| 1 \| `?foo?` \| `foo` \| `Help` \| \| 2 \| `??foo?` \| `foo` \| `Help` \| \| 3 \| `?foo??` \| `foo` \| `Help2` \| \| 4 \| `??foo??` \| `foo` \| `Help2` \| \| 5 \| `???foo??` \| `foo` \| `Help2` \| \| 6 \| `??foo???` \| `foo???` \| `Help2` \| \| 7 \| `???foo???` \| `?foo???` \| `Help2` \| Looking at the above table: - The question mark tokens on the right takes priority over the ones on the left but only if the number of question mark on the right is 1 or 2. - If there are more than 2 question mark tokens on the right side, then the left side is used to determine the same. - If the right side is used to determine the kind, then all of the question marks and whitespaces on the left side are ignored in the `value`, but if it’s the other way around, then all of the extra question marks are part of the `value`. ### References - IPython implementation using the regex: `292e3a2345/IPython/core/inputtransformer2.py (L454-L462)` - Priorities: `292e3a2345/IPython/core/inputtransformer2.py (L466-L469)` ## Test Plan Add a bunch of test cases for the lexer and verify that it matches the behavior of IPython transformer. resolves: #6357	2023-08-07 21:01:02 +05:30
Charlie Marsh	b095b7204b	Add a `TypeParams` node to the AST (#6261 ) ## Summary Similar to #6259, this PR adds a `TypeParams` node to the AST, to capture the list of type parameters with their surrounding brackets. If a statement lacks type parameters, the `type_params` field will be `None`.	2023-08-02 14:12:45 +00:00
Charlie Marsh	981e64f82b	Introduce an `Arguments` AST node for function calls and class definitions (#6259 ) ## Summary This PR adds a new `Arguments` AST node, which we can use for function calls and class definitions. The `Arguments` node spans from the left (open) to right (close) parentheses inclusive. In the case of classes, the `Arguments` is an option, to differentiate between: ```python # None class C: ... # Some, with empty vectors class C(): ... ``` In this PR, we don't really leverage this change (except that a few rules get much simpler, since we don't need to lex to find the start and end ranges of the parentheses, e.g., `crates/ruff/src/rules/pyupgrade/rules/lru_cache_without_parameters.rs`, `crates/ruff/src/rules/pyupgrade/rules/unnecessary_class_parentheses.rs`). In future PRs, this will be especially helpful for the formatter, since we can track comments enclosed on the node itself. ## Test Plan `cargo test`	2023-08-02 10:01:13 -04:00
Charlie Marsh	9c708d8fc1	Rename `Parameter#arg` and `ParameterWithDefault#def` fields (#6255 ) ## Summary This PR renames... - `Parameter#arg` to `Parameter#name` - `ParameterWithDefault#def` to `ParameterWithDefault#parameter` (such that `ParameterWithDefault` has a `default` and a `parameter`) ## Test Plan `cargo test`	2023-08-01 14:28:34 -04:00
Charlie Marsh	adc8bb7821	Rename `Arguments` to `Parameters` in the AST (#6253 ) ## Summary This PR renames a few AST nodes for clarity: - `Arguments` is now `Parameters` - `Arg` is now `Parameter` - `ArgWithDefault` is now `ParameterWithDefault` For now, the attribute names that reference `Parameters` directly are changed (e.g., on `StmtFunctionDef`), but the attributes on `Parameters` itself are not (e.g., `vararg`). We may revisit that decision in the future. For context, the AST node formerly known as `Arguments` is used in function definitions. Formally (outside of the Python context), "arguments" typically refers to "the values passed to a function", while "parameters" typically refers to "the variables used in a function definition". E.g., if you Google "arguments vs parameters", you'll get some explanation like: > A parameter is a variable in a function definition. It is a placeholder and hence does not have a concrete value. An argument is a value passed during function invocation. We're thus deviating from Python's nomenclature in favor of a scheme that we find to be more precise.	2023-08-01 13:53:28 -04:00
Micha Reiser	7c7231db2e	Remove unsupported `type_comment` field <!-- Thank you for contributing to Ruff! To help us out with reviewing, please consider the following: - Does this pull request include a summary of the change? (See below.) - Does this pull request include a descriptive title? - Does this pull request include references to any relevant issues? --> ## Summary This PR removes the `type_comment` field which our parser doesn't support. <!-- What's the purpose of the change? What does it do, and why? --> ## Test Plan `cargo test` <!-- How was it tested? -->	2023-08-01 12:53:13 +02:00
Micha Reiser	4ad5903ef6	Delete type-ignore node <!-- Thank you for contributing to Ruff! To help us out with reviewing, please consider the following: - Does this pull request include a summary of the change? (See below.) - Does this pull request include a descriptive title? - Does this pull request include references to any relevant issues? --> ## Summary This PR removes the type ignore node from the AST because our parser doesn't support it, and just having it around is confusing. <!-- What's the purpose of the change? What does it do, and why? --> ## Test Plan `cargo build` <!-- How was it tested? -->	2023-08-01 12:34:50 +02:00
David Szotten	ba990b676f	add `DebugText` for self-documenting f-strings (#6167 )	2023-08-01 07:55:03 +02:00
Micha Reiser	40f54375cb	Pull in RustPython parser (#6099 )	2023-07-27 09:29:11 +00:00

45 commits