mirrors/ruff - Forgejo: Beyond coding. We Forge.

mirror of https://github.com/astral-sh/ruff.git synced 2025-07-09 22:25:09 +00:00

Author	SHA1	Message	Date
Charlie Marsh	97e9d3c54f	Use `Display` for formatter parse errors (#9316 ) ## Summary This helps a bit with (but does not close) the issues described in https://github.com/astral-sh/ruff/issues/9311. E.g., now, we at least see: `error: Failed to format main.py: source contains syntax errors: invalid syntax. Got unexpected token '=' at byte offset 20`.	2023-12-29 22:26:57 +00:00
Charlie Marsh	9d6444138b	Remove lexing and parsing from the linter benchmark (#9264 ) ## Summary This PR adds some helper structs to the linter paths to enable passing in the pre-computed tokens and parsed source code during benchmarking, to remove lexing and parsing from the overall linter benchmark measurement. We already remove parsing for the formatter, and we have separate benchmarks for the lexer and the parser, so this should make it much easier to measure linter performance changes.	2023-12-23 16:43:11 -05:00
Andrew Gallant	3ce145c476	release: switch to Cargo's default (#9031 ) This sets `lto = "thin"` instead of using "fat" LTO, and sets `codegen-units = 16`. These are the defaults for Cargo's `release` profile, and I think it may give us faster iteration times, especially when benchmarking. The point of this PR is to see what kind of impact this has on benchmarks. It is expected that benchmarks may regress to some extent. I did some quick ad hoc experiments to quantify this change in compile times. Namely, I ran: cargo build --profile release -p ruff_cli Then I ran touch crates/ruff_python_formatter/src/expression/string/docstring.rs (because that's where i've been working lately) and re-ran cargo build --profile release -p ruff_cli This last command is what I timed, since it reflects how much time one has to wait between making a change and getting a compiled artifact. Here are my results: * With status quo `release` profile, build takes 77s * with `release` but `lto = "thin"`, build takes 41s * with `release`, but `lto = false`, build takes 19s * with `release`, but `lto = false` and `codegen-units = 16`, build takes 7s * with `release`, but `lto = "thin"` and `codegen-units = 16`, build takes 16s (i believe this is the default `release` configuration) This PR represents the last option. It's not the fastest to compile, but it's nearly a whole minute faster! The idea is that with `codegen-units = 16`, we still make use of parallelism, but keep _some_ level of LTO on to try and re-gain what we lose by increasing the number of codegen units.	2023-12-15 08:19:35 -05:00
Dhruv Manilawala	cdac90ef68	New AST nodes for f-string elements (#8835 ) Rebase of #6365 authored by @davidszotten. ## Summary This PR updates the AST structure for an f-string elements. The main motivation behind this change is to have a dedicated node for the string part of an f-string. Previously, the existing `ExprStringLiteral` node was used for this purpose which isn't exactly correct. The `ExprStringLiteral` node should include the quotes as well in the range but the f-string literal element doesn't include the quote as it's a specific part within an f-string. For example, ```python f"foo {x}" # ^^^^ # This is the literal part of an f-string ``` The introduction of `FStringElement` enum is helpful which represent either the literal part or the expression part of an f-string. ### Rule Updates This means that there'll be two nodes representing a string depending on the context. One for a normal string literal while the other is a string literal within an f-string. The AST checker is updated to accommodate this change. The rules which work on string literal are updated to check on the literal part of f-string as well. #### Notes 1. The `Expr::is_literal_expr` method would check for `ExprStringLiteral` and return true if so. But now that we don't represent the literal part of an f-string using that node, this improves the method's behavior and confines to the actual expression. We do have the `FStringElement::is_literal` method. 2. We avoid checking if we're in a f-string context before adding to `string_type_definitions` because the f-string literal is now a dedicated node and not part of `Expr`. 3. Annotations cannot use f-string so we avoid changing any rules which work on annotation and checks for `ExprStringLiteral`. ## Test Plan - All references of `Expr::StringLiteral` were checked to see if any of the rules require updating to account for the f-string literal element node. - New test cases are added for rules which check against the literal part of an f-string. - Check the ecosystem results and ensure it remains unchanged. ## Performance There's a performance penalty in the parser. The reason for this remains unknown as it seems that the generated assembly code is now different for the `__reduce154` function. The reduce function body is just popping the `ParenthesizedExpr` on top of the stack and pushing it with the new location. - The size of `FStringElement` enum is the same as `Expr` which is what it replaces in `FString::format_spec` - The size of `FStringExpressionElement` is the same as `ExprFormattedValue` which is what it replaces I tried reducing the `Expr` enum from 80 bytes to 72 bytes but it hardly resulted in any performance gain. The difference can be seen here: - Original profile: https://share.firefox.dev/3Taa7ES - Profile after boxing some node fields: https://share.firefox.dev/3GsNXpD ### Backtracking I tried backtracking the changes to see if any of the isolated change produced this regression. The problem here is that the overall change is so small that there's only a single checkpoint where I can backtrack and that checkpoint results in the same regression. This checkpoint is to revert using `Expr` to the `FString::format_spec` field. After this point, the change would revert back to the original implementation. ## Review process The review process is similar to #7927. The first set of commits update the node structure, parser, and related AST files. Then, further commits update the linter and formatter part to account for the AST change. --------- Co-authored-by: David Szotten <davidszotten@gmail.com>	2023-12-07 10:28:05 -06:00
Micha Reiser	7e390d3772	Move `ParenthesizedExpr` to `ruff_python_parser` (#8987 )	2023-12-04 05:36:28 +00:00
Charlie Marsh	20782ab02c	Support type alias statements in simple statement positions (#8916 ) <!-- Thank you for contributing to Ruff! To help us out with reviewing, please consider the following: - Does this pull request include a summary of the change? (See below.) - Does this pull request include a descriptive title? - Does this pull request include references to any relevant issues? --> ## Summary Our `SoftKeywordTokenizer` only respected soft keywords in compound statement positions -- for example, at the start of a logical line: ```python type X = int ``` However, type aliases can also appear in simple statement positions, like: ```python class Class: type X = int ``` (Note that `match` and `case` are _not_ valid keywords in such positions.) This PR upgrades the tokenizer to track both kinds of valid positions. Closes https://github.com/astral-sh/ruff/issues/8900. Closes https://github.com/astral-sh/ruff/issues/8899. ## Test Plan `cargo test`	2023-11-30 19:15:19 +00:00
Charlie Marsh	774c77adae	Avoid off-by-one error in with-item named expressions (#8915 ) ## Summary Given `with (a := b): pass`, we truncate the `WithItem` range by one on both sides such that the parentheses are part of the statement, rather than the item. However, for `with (a := b) as x: pass`, we want to avoid this trick. Closes https://github.com/astral-sh/ruff/issues/8913.	2023-11-30 00:11:04 +00:00
Charlie Marsh	6435e4e4aa	Enable auto-return-type involving `Optional` and `Union` annotations (#8885 ) ## Summary Previously, this was only supported for Python 3.10 and later, since we always use the PEP 604-style unions.	2023-11-28 18:35:55 -08:00
Dhruv Manilawala	47d80f29a7	Lexer start of line is false only for `Mode::Expression` (#8880 ) ## Summary This PR fixes the bug in the lexer where the `Mode::Ipython` wasn't being considered when initializing the soft keyword transformer which wraps the lexer. This means that if the source code starts with either `match` or `type` keyword, then the keywords were being considered as name tokens instead. For example, ```python match foo: case bar: pass ``` This would transform the `match` keyword into an identifier if the mode is `Ipython`. The fix is to reverse the condition in the soft keyword initializer so that any new modes are by default considered as the lexer being at start of line. ## Test Plan Add a new test case for `Mode::Ipython` and verify the snapshot. fixes: #8870	2023-11-28 20:38:25 +00:00
Andrew Gallant	f585e3e2dc	remove several uses of `unsafe` (#8600 ) This PR removes several uses of `unsafe`. I generally limited myself to low hanging fruit that I could see. There are still a few remaining uses of `unsafe` that looked a bit more difficult to remove (if possible at all). But this gets rid of a good chunk of them. I put each `unsafe` removal into its own commit with a justification for why I did it. So I would encourage reviewing this PR commit-by-commit. That way, we can legislate them independently. It's no problem to drop a commit if we feel the `unsafe` should stay in that case.	2023-11-28 09:50:03 -05:00
Dhruv Manilawala	017e829115	Update string nodes for implicit concatenation (#7927 ) ## Summary This PR updates the string nodes (`ExprStringLiteral`, `ExprBytesLiteral`, and `ExprFString`) to account for implicit string concatenation. ### Motivation In Python, implicit string concatenation are joined while parsing because the interpreter doesn't require the information for each part. While that's feasible for an interpreter, it falls short for a static analysis tool where having such information is more useful. Currently, various parts of the code uses the lexer to get the individual string parts. One of the main challenge this solves is that of string formatting. Currently, the formatter relies on the lexer to get the individual string parts, and formats them including the comments accordingly. But, with PEP 701, f-string can also contain comments. Without this change, it becomes very difficult to add support for f-string formatting. ### Implementation The initial proposal was made in this discussion: https://github.com/astral-sh/ruff/discussions/6183#discussioncomment-6591993. There were various AST designs which were explored for this task which are available in the linked internal document[^1]. The selected variant was the one where the nodes were kept as it is except that the `implicit_concatenated` field was removed and instead a new struct was added to the `Expr*` struct. This would be a private struct would contain the actual implementation of how the AST is designed for both single and implicitly concatenated strings. This implementation is achieved through an enum with two variants: `Single` and `Concatenated` to avoid allocating a vector even for single strings. There are various public methods available on the value struct to query certain information regarding the node. The nodes are structured in the following way: ``` ExprStringLiteral - "foo" "bar" \|- StringLiteral - "foo" \|- StringLiteral - "bar" ExprBytesLiteral - b"foo" b"bar" \|- BytesLiteral - b"foo" \|- BytesLiteral - b"bar" ExprFString - "foo" f"bar {x}" \|- FStringPart::Literal - "foo" \|- FStringPart::FString - f"bar {x}" \|- StringLiteral - "bar " \|- FormattedValue - "x" ``` [^1]: Internal document: https://www.notion.so/astral-sh/Implicit-String-Concatenation-e036345dc48943f89e416c087bf6f6d9?pvs=4 #### Visitor The way the nodes are structured is that the entire string, including all the parts that are implicitly concatenation, is a single node containing individual nodes for the parts. The previous section has a representation of that tree for all the string nodes. This means that new visitor methods are added to visit the individual parts of string, bytes, and f-strings for `Visitor`, `PreorderVisitor`, and `Transformer`. ## Test Plan - `cargo insta test --workspace --all-features --unreferenced reject` - Verify that the ecosystem results are unchanged	2023-11-24 17:55:41 -06:00
konsti	14e65afdc6	Update to Rust 1.74 and use new clippy lints table (#8722 ) Update to [Rust 1.74](https://blog.rust-lang.org/2023/11/16/Rust-1.74.0.html) and use the new clippy lints table. The update itself introduced a new clippy lint about superfluous hashes in raw strings, which got removed. I moved our lint config from `rustflags` to the newly stabilized [workspace.lints](https://doc.rust-lang.org/stable/cargo/reference/workspaces.html#the-lints-table). One consequence is that we have to `unsafe_code = "warn"` instead of "forbid" because the latter now actually bans unsafe code: ``` error[E0453]: allow(unsafe_code) incompatible with previous forbid --> crates/ruff_source_file/src/newlines.rs:62:17 \| 62 \| #[allow(unsafe_code)] \| ^^^^^^^^^^^ overruled by previous forbid \| = note: `forbid` lint level was set on command line ``` --------- Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>	2023-11-16 18:12:46 -05:00
Andrew Gallant	6a1fa4778f	Reject more syntactically invalid Python programs (#8524 ) ## Summary This commit adds some additional error checking to the parser such that assignments that are invalid syntax are rejected. This covers the obvious cases like `5 = 3` and some not so obvious cases like `x + y = 42`. This does add an additional recursive call to the parser for the cases handling assignments. I had initially been concerned about doing this, but `set_context` is already doing recursion during assignments, so I didn't feel as though this was changing any fundamental performance characteristics of the parser. (Also, in practice, I would expect any such recursion here to be quite shallow since the recursion is done on the target of an assignment. Such things are rarely nested much in practice.) Fixes #6895 ## Test Plan I've added unit tests covering every case that is detected as invalid on an `Expr`.	2023-11-07 07:16:06 -05:00
konsti	daea870c3c	Fix panic with 8 in octal escape (#8356 ) Summary The digits for an octal escape are 0 to 7, not 0 to 8, fixing the panic in #8355 Test plan Regression test parser fixture	2023-10-30 14:42:15 +01:00
Dhruv Manilawala	230c9ce236	Split `Constant` to individual literal nodes (#8064 ) ## Summary This PR splits the `Constant` enum as individual literal nodes. It introduces the following new nodes for each variant: * `ExprStringLiteral` * `ExprBytesLiteral` * `ExprNumberLiteral` * `ExprBooleanLiteral` * `ExprNoneLiteral` * `ExprEllipsisLiteral` The main motivation behind this refactor is to introduce the new AST node for implicit string concatenation in the coming PR. The elements of that node will be either a string literal, bytes literal or a f-string which can be implemented using an enum. This means that a string or bytes literal cannot be represented by `Constant::Str` / `Constant::Bytes` which creates an inconsistency. This PR avoids that inconsistency by splitting the constant nodes into it's own literal nodes, literal being the more appropriate naming convention from a static analysis tool perspective. This also makes working with literals in the linter and formatter much more ergonomic like, for example, if one would want to check if this is a string literal, it can be done easily using `Expr::is_string_literal_expr` or matching against `Expr::StringLiteral` as oppose to matching against the `ExprConstant` and enum `Constant`. A few AST helper methods can be simplified as well which will be done in a follow-up PR. This introduces a new `Expr::is_literal_expr` method which is the same as `Expr::is_constant_expr`. There are also intermediary changes related to implicit string concatenation which are quiet less. This is done so as to avoid having a huge PR which this already is. ## Test Plan 1. Verify and update all of the existing snapshots (parser, visitor) 2. Verify that the ecosystem check output remains unchanged for both the linter and formatter ### Formatter ecosystem check #### `main` \| project \| similarity index \| total files \| changed files \| \|----------------\|------------------:\|------------------:\|------------------:\| \| cpython \| 0.75803 \| 1799 \| 1647 \| \| django \| 0.99983 \| 2772 \| 34 \| \| home-assistant \| 0.99953 \| 10596 \| 186 \| \| poetry \| 0.99891 \| 317 \| 17 \| \| transformers \| 0.99966 \| 2657 \| 330 \| \| twine \| 1.00000 \| 33 \| 0 \| \| typeshed \| 0.99978 \| 3669 \| 20 \| \| warehouse \| 0.99977 \| 654 \| 13 \| \| zulip \| 0.99970 \| 1459 \| 22 \| #### `dhruv/constant-to-literal` \| project \| similarity index \| total files \| changed files \| \|----------------\|------------------:\|------------------:\|------------------:\| \| cpython \| 0.75803 \| 1799 \| 1647 \| \| django \| 0.99983 \| 2772 \| 34 \| \| home-assistant \| 0.99953 \| 10596 \| 186 \| \| poetry \| 0.99891 \| 317 \| 17 \| \| transformers \| 0.99966 \| 2657 \| 330 \| \| twine \| 1.00000 \| 33 \| 0 \| \| typeshed \| 0.99978 \| 3669 \| 20 \| \| warehouse \| 0.99977 \| 654 \| 13 \| \| zulip \| 0.99970 \| 1459 \| 22 \|	2023-10-30 12:13:23 +05:30
Dhruv Manilawala	78bbf6d403	New `Singleton` enum for `PatternMatchSingleton` node (#8063 ) ## Summary This PR adds a new `Singleton` enum for the `PatternMatchSingleton` node. Earlier the node was using the `Constant` enum but the value for this pattern can only be either `None`, `True` or `False`. With the coming PR to remove the `Constant`, this node required a new type to fill in. This also has the benefit of narrowing the type down to only the possible values for the node as evident by the removal of `unreachable`. ## Test Plan Update the AST snapshots and run `cargo test`.	2023-10-30 05:48:53 +00:00
Carter Snook	2f5734d1ac	perf(parser): use faster string parser methods (#8227 ) ## Summary This makes use of memchr and other methods to parse the strings (hopefully) faster. It might also be worth converting the `parse_fstring_middle` helper to use similar techniques, but I did not implement it in this PR. ## Test Plan This was tested using the existing tests and passed all of them.	2023-10-28 18:50:54 -04:00
Carter Snook	e2b5c6ac5f	perf(parser): use memchr for lexing comments (#8193 )	2023-10-27 02:07:43 +01:00
Charlie Marsh	d6a4283003	Fix range of unparenthesized tuple subject in match statement (#8101 ) ## Summary This was just a bug in the parser ranges, probably since it was initially implemented. Given `match n % 3, n % 5: ...`, the "subject" (i.e., the tuple of two binary operators) was using the entire range of the `match` statement. Closes https://github.com/astral-sh/ruff/issues/8091. ## Test Plan `cargo test`	2023-10-22 19:58:33 -04:00
Dhruv Manilawala	43883b7a15	Disallow f-strings in match pattern literal (#7857 ) ## Summary This PR fixes a bug to disallow f-strings in match pattern literal. ``` literal_pattern ::= signed_number \| signed_number "+" NUMBER \| signed_number "-" NUMBER \| strings \| "None" \| "True" \| "False" \| signed_number: NUMBER \| "-" NUMBER ``` Source: https://docs.python.org/3/reference/compound_stmts.html#grammar-token-python-grammar-literal_pattern Also, ```console $ python /tmp/t.py File "/tmp/t.py", line 4 case "hello " f"{name}": ^^^^^^^^^^^^^^^^^^ SyntaxError: patterns may only match literals and attribute lookups ``` ## Test Plan Update existing test case and accordingly the snapshots. Also, add a new test case to verify that the parser does raise an error.	2023-10-09 10:11:08 +00:00
Dhruv Manilawala	709abd534a	Fix lexing single-quoted f-string with multi-line format spec (#7787 ) ## Summary Reported at https://github.com/python/cpython/issues/110259 ## Test Plan Add test cases for the fix and update the snapshots	2023-10-05 23:12:09 +05:30
Dhruv Manilawala	17fba99ed4	Report precise location for invalid conversion flag (#7809 ) ## Summary This PR updates the parser definition to use the precise location when reporting an invalid f-string conversion flag error. Taking the following example code: ```python f"{foo!x}" ``` On earlier version, ``` Error: f-string: invalid conversion character at byte offset 6 ``` Now, ``` Error: f-string: invalid conversion character at byte offset 7 ``` This becomes more useful when there's whitespace between `!` and the flag value although that is not valid but we can't detect that now. ## Test Plan As mentioned above.	2023-10-05 17:46:14 +05:30
Dhruv Manilawala	69b8136463	Avoid curly brace escape in f-string format spec (#7780 ) ## Summary This PR fixes a bug in the lexer for f-string format spec where it would consider the `{{` (double curly braces) as an escape pattern. This is not the case as evident by the [PEP](https://peps.python.org/pep-0701/#how-to-produce-these-new-tokens) as well but I missed the part: > [..] > * If in “format specifier mode” (see step 3), an opening brace ({) or a closing brace (}). > * If not in “format specifier mode” (see step 3), an opening brace ({) or a closing brace (}) that is not immediately followed by another opening/closing brace. ## Test Plan Add a test case to verify the fix and update the snapshot. fixes: #7778	2023-10-03 19:38:03 +05:30
konsti	3ccd1d580d	Use crates.io unicode_names2 0.6.0 (#6478 ) Update `unicode_names2` to the crates.io release 0.6.0, removing a git dependency.	2023-10-02 18:17:38 -04:00
Charlie Marsh	2838f7af98	Skip all bracketed expressions when locating comparison ops (#7740 ) Closes https://github.com/astral-sh/ruff/issues/7737.	2023-10-01 14:57:40 +00:00
Dhruv Manilawala	e91ffe3e93	Consume the escaped Windows newline (`\r\n`) for `FStringMiddle` (#7722 ) ## Summary This PR fixes a bug where if a Windows newline (`\r\n`) character was escaped, then only the `\r` was consumed and not `\n` leading to an unterminated string error. ## Test Plan Add new test cases to check the newline escapes. fixes: #7632	2023-10-01 07:58:20 +05:30
Dhruv Manilawala	e72d617f4b	Remove escaped mac/windows eol from AST string value (#7724 ) ## Summary This PR fixes the bug where the value of a string node type includes the escaped mac/windows newline character. Note that the token value still includes them, it's only removed when parsing the string content. ## Test Plan Add new test cases for the string node type to check that the escapes aren't being included in the string value. fixes: #7723	2023-10-01 07:37:59 +05:30
Dhruv Manilawala	e62e245c61	Add support for PEP 701 (#7376 ) ## Summary This PR adds support for PEP 701 in Ruff. This is a rollup PR of all the other individual PRs. The separate PRs were created for logic separation and code reviews. Refer to each pull request for a detail description on the change. Refer to the PR description for the list of pull requests within this PR. ## Test Plan ### Formatter ecosystem checks Explanation for the change in ecosystem check: https://github.com/astral-sh/ruff/pull/7597#issue-1908878183 #### `main` ``` \| project \| similarity index \| total files \| changed files \| \|--------------\|------------------:\|------------------:\|------------------:\| \| cpython \| 0.76083 \| 1789 \| 1631 \| \| django \| 0.99983 \| 2760 \| 36 \| \| transformers \| 0.99963 \| 2587 \| 319 \| \| twine \| 1.00000 \| 33 \| 0 \| \| typeshed \| 0.99983 \| 3496 \| 18 \| \| warehouse \| 0.99967 \| 648 \| 15 \| \| zulip \| 0.99972 \| 1437 \| 21 \| ``` #### `dhruv/pep-701` ``` \| project \| similarity index \| total files \| changed files \| \|--------------\|------------------:\|------------------:\|------------------:\| \| cpython \| 0.76051 \| 1789 \| 1632 \| \| django \| 0.99983 \| 2760 \| 36 \| \| transformers \| 0.99963 \| 2587 \| 319 \| \| twine \| 1.00000 \| 33 \| 0 \| \| typeshed \| 0.99983 \| 3496 \| 18 \| \| warehouse \| 0.99967 \| 648 \| 15 \| \| zulip \| 0.99972 \| 1437 \| 21 \| ```	2023-09-29 02:55:39 +00:00
Charlie Marsh	f45281345d	Include radix base prefix in large number representation (#7700 ) ## Summary When lexing a number like `0x995DC9BBDF1939FA` that exceeds our small number representation, we were only storing the portion after the base (in this case, `995DC9BBDF1939FA`). When using that representation in code generation, this could lead to invalid syntax, since `995DC9BBDF1939FA)` on its own is not a valid integer. This PR modifies the code to store the full span, including the radix prefix. See: https://github.com/astral-sh/ruff/issues/7455#issuecomment-1739802958. ## Test Plan `cargo test`	2023-09-28 20:38:06 +00:00
konsti	4d16e2308d	Formatter and parser refactoring (#7569 ) I got confused and refactored a bit, now the naming should be more consistent. This is the basis for the range formatting work. Chages: * `format_module` -> `format_module_source` (format a string) * `format_node` -> `format_module_ast` (format a program parsed into an AST) * Added `parse_ok_tokens` that takes `Token` instead of `Result<Token>` * Call the source code `source` consistently * Added a `tokens_and_ranges` helper * `python_ast` -> `module` (because that's the type)	2023-09-26 15:29:43 +02:00
Charlie Marsh	93b5d8a0fb	Implement our own small-integer optimization (#7584 ) ## Summary This is a follow-up to #7469 that attempts to achieve similar gains, but without introducing malachite. Instead, this PR removes the `BigInt` type altogether, instead opting for a simple enum that allows us to store small integers directly and only allocate for values greater than `i64`: ```rust /// A Python integer literal. Represents both small (fits in an `i64`) and large integers. #[derive(Clone, PartialEq, Eq, Hash)] pub struct Int(Number); #[derive(Debug, Clone, PartialEq, Eq, Hash)] pub enum Number { /// A "small" number that can be represented as an `i64`. Small(i64), /// A "large" number that cannot be represented as an `i64`. Big(Box<str>), } impl std::fmt::Display for Number { fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { match self { Number::Small(value) => write!(f, "{value}"), Number::Big(value) => write!(f, "{value}"), } } } ``` We typically don't care about numbers greater than `isize` -- our only uses are comparisons against small constants (like `1`, `2`, `3`, etc.), so there's no real loss of information, except in one or two rules where we're now a little more conservative (with the worst-case being that we don't flag, e.g., an `itertools.pairwise` that uses an extremely large value for the slice start constant). For simplicity, a few diagnostics now show a dedicated message when they see integers that are out of the supported range (e.g., `outdated-version-block`). An additional benefit here is that we get to remove a few dependencies, especially `num-bigint`. ## Test Plan `cargo test`	2023-09-25 15:13:21 +00:00
Micha Reiser	8ce138760a	Emit `LexError` for dedent to incorrect level (#7638 )	2023-09-25 11:45:44 +01:00
Chammika Mannakkara	e41b08f1d0	Fix typo in infinite (#7614 )	2023-09-23 11:19:36 +00:00
Charlie Marsh	4d6f5ff0a7	Remove `Int` wrapper type from parser (#7577 ) ## Summary This is only used for the `level` field in relative imports (e.g., `from ..foo import bar`). It seems unnecessary to use a wrapper here, so this PR changes to a `u32` directly.	2023-09-21 17:01:44 +00:00
Dhruv Manilawala	a41bb2733f	Add range to lexer test snapshots (#7265 ) ## Summary This PR updates the lexer test snapshots to include the range value as well. This is mainly a mechanical refactor. ### Motivation The main motivation is so that we can verify that the ranges are valid and do not overlap. ## Test Plan `cargo test`	2023-09-11 19:12:46 +00:00
Dhruv Manilawala	f5701fcc63	Use snapshots for remaining lexer tests (#7264 ) ## Summary This PR updates the remaining lexer test cases to use the snapshots. This is mainly a mechanical refactor. ## Motivation The main motivation is so that when we add the token range values to the test case output, it's easier to update the test cases. The reason they were not using the snapshots before was because of the usage of `test_case` macro. The macros is mainly used for different EOL test cases. If we just generate the snapshots directly, then the snapshot name would be suffixed with `-1`, `-2`, etc. as the test function is still the same. So, we'll create the snapshot ourselves with the platform name for the respective EOL test cases. ## Test Plan `cargo test`	2023-09-12 00:16:38 +05:30
Micha Reiser	7440e54ec6	Avoid allocating in lex_decimal (#7252 )	2023-09-11 06:37:25 +00:00
Micha Reiser	842ff0212e	Add Lexer emoji test case (#7213 )	2023-09-07 10:02:50 +00:00
Micha Reiser	f1a4eb9c28	Use the unicode-ident crate (#7212 )	2023-09-07 08:19:25 +00:00
Victor Hugo Gomes	041cdb95e0	Update identifier Unicode character validation to match Python spec (#7209 ) Co-authored-by: Micha Reiser <micha@reiser.io>	2023-09-07 07:08:42 +00:00
Micha Reiser	171b66cb43	Lexer: Add skip whitespace fastpath (#7184 )	2023-09-06 16:14:01 +02:00
Dhruv Manilawala	04f2842e4f	Move `ExprConstant::kind` to `StringConstant::unicode` (#7180 )	2023-09-06 07:39:25 +00:00
Dhruv Manilawala	1adde24133	Rename parser mode from `Jupyter` to `Ipython` (#7153 )	2023-09-05 14:12:26 +00:00
Charlie Marsh	68f605e80a	Fix `WithItem` ranges for parenthesized, non-`as` items (#6782 ) ## Summary This PR attempts to address a problem in the parser related to the range's of `WithItem` nodes in certain contexts -- specifically, `WithItem` nodes in parentheses that do not have an `as` token after them. For example, [here](https://play.ruff.rs/71be2d0b-2a04-4c7e-9082-e72bff152679): ```python with (a, b): pass ``` The range of the `WithItem` `a` is set to the range of `(a, b)`, as is the range of the `WithItem` `b`. In other words, when we have this kind of sequence, we use the range of the entire parenthesized context, rather than the ranges of the items themselves. Note that this also applies to cases [like](https://play.ruff.rs/c551e8e9-c3db-4b74-8cc6-7c4e3bf3713a): ```python with (a, b, c as d): pass ``` You can see the issue in the parser here: ```rust #[inline] WithItemsNoAs: Vec<ast::WithItem> = { <location:@L> <all:OneOrMore<Test<"all">>> <end_location:@R> => { all.into_iter().map(\|context_expr\| ast::WithItem { context_expr, optional_vars: None, range: (location..end_location).into() }).collect() }, } ``` Fixing this issue is... very tricky. The naive approach is to use the range of the `context_expr` as the range for the `WithItem`, but that range will be incorrect when the `context_expr` is itself parenthesized. For example, _that_ solution would fail here, since the range of the first `WithItem` would be that of `a`, rather than `(a)`: ```python with ((a), b): pass ``` The `with` parsing in general is highly precarious due to ambiguities in the grammar. Changing it in _any_ way seems to lead to an ambiguous grammar that LALRPOP fails to translate. Consensus seems to be that we don't really understand _why_ the current grammar works (i.e., _how_ it avoids these ambiguities as-is). The solution implemented here is to avoid changing the grammar itself, and instead change the shape of the nodes returned by various rules in the grammar. Specifically, everywhere that we return `Expr`, we instead return `ParenthesizedExpr`, which includes a parenthesized range and the underlying `Expr` itself. (If an `Expr` isn't parenthesized, the ranges will be equivalent.) In `WithItemsNoAs`, we can then use the parenthesized range as the range for the `WithItem`.	2023-08-31 16:21:29 +01:00
Dhruv Manilawala	4d49d5e845	Add `eat_char2` for the lexer (#6968 ) ## Summary This PR adds a new helper method on the `Cursor` called `eat_char2` which is similar to `eat_char` but accepts 2 characters instead of 1. It'll `bump` the cursor twice if both characters are found on lookahead. ## Test Plan `cargo test`	2023-08-29 17:18:02 +05:30
Dhruv Manilawala	2893a9f6b5	Remove unused f-string error type (#6941 )	2023-08-28 18:34:48 +05:30
Dhruv Manilawala	9c98416b96	Avoid lexer infinite loop on invalid input (#6937 ) ## Summary This PR fixes a bug which sends the lexer into infinite loop for an invalid input. The code in question is `[1` where the nesting is never finished. This means that the lexer will keep emitting the `Err` token forever. ## Test Plan Add a test case which collects all the tokens from the lexer. This just makes sure that it doesn't go into infinite loop.	2023-08-28 17:21:38 +05:30
Charlie Marsh	fc89976c24	Move `Ranged` into `ruff_text_size` (#6919 ) ## Summary The motivation here is that this enables us to implement `Ranged` in crates that don't depend on `ruff_python_ast`. Largely a mechanical refactor with a lot of regex, Clippy help, and manual fixups. ## Test Plan `cargo test`	2023-08-27 14:12:51 -04:00
Charlie Marsh	15b73bdb8a	Introduce AST nodes for `PatternMatchClass` arguments (#6881 ) ## Summary This PR introduces two new AST nodes to improve the representation of `PatternMatchClass`. As a reminder, `PatternMatchClass` looks like this: ```python case Point2D(0, 0, x=1, y=2): ... ``` Historically, this was represented as a vector of patterns (for the `0, 0` portion) and parallel vectors of keyword names (for `x` and `y`) and values (for `1` and `2`). This introduces a bunch of challenges for the formatter, but importantly, it's also really different from how we represent similar nodes, like arguments (`func(0, 0, x=1, y=2)`) or parameters (`def func(x, y)`). So, firstly, we now use a single node (`PatternArguments`) for the entire parenthesized region, making it much more consistent with our other nodes. So, above, `PatternArguments` would be `(0, 0, x=1, y=2)`. Secondly, we now have a `PatternKeyword` node for `x=1` and `y=2`. This is much more similar to the how `Keyword` is represented within `Arguments` for call expressions. Closes https://github.com/astral-sh/ruff/issues/6866. Closes https://github.com/astral-sh/ruff/issues/6880.	2023-08-26 14:45:44 +00:00
Micha Reiser	15b7525464	Rename parser goal 'All' to 'all' (#6867 )	2023-08-25 12:00:57 +00:00

... 2 3 4 5 6

278 commits