Commit graph

46 commits

Author SHA1 Message Date
Ibraheem Ahmed
c9dff5c7d5
[ty] AST garbage collection (#18482)
## Summary

Garbage collect ASTs once we are done checking a given file. Queries
with a cross-file dependency on the AST will reparse the file on demand.
This reduces ty's peak memory usage by ~20-30%.

The primary change of this PR is adding a `node_index` field to every
AST node, that is assigned by the parser. `ParsedModule` can use this to
create a flat index of AST nodes any time the file is parsed (or
reparsed). This allows `AstNodeRef` to simply index into the current
instance of the `ParsedModule`, instead of storing a pointer directly.

The indices are somewhat hackily (using an atomic integer) assigned by
the `parsed_module` query instead of by the parser directly. Assigning
the indices in source-order in the (recursive) parser turns out to be
difficult, and collecting the nodes during semantic indexing is
impossible as `SemanticIndex` does not hold onto a specific
`ParsedModuleRef`, which the pointers in the flat AST are tied to. This
means that we have to do an extra AST traversal to assign and collect
the nodes into a flat index, but the small performance impact (~3% on
cold runs) seems worth it for the memory savings.

Part of https://github.com/astral-sh/ty/issues/214.
2025-06-13 08:40:11 -04:00
Dylan
9bbf4987e8
Implement template strings (#17851)
This PR implements template strings (t-strings) in the parser and
formatter for Ruff.

Minimal changes necessary to compile were made in other parts of the code (e.g. ty, the linter, etc.). These will be covered properly in follow-up PRs.
2025-05-30 15:00:56 -05:00
Andrew Gallant
a827b16ebd ruff_python_parser: add Tokens::before method
This is analogous to the existing `Tokens::after` method. Its
implementation is almost identical.

We plan to use this for looking at the tokens immediately before the
cursor when fetching completions.
2025-05-29 10:31:30 -04:00
Micha Reiser
a4ba10ff0a
[red-knot] Add basic on-hover to playground and LSP (#17057)
## Summary

Implement a very basic hover in the playground and LSP.

It's basic, because it only shows the type on-hover. Most other LSPs
also show:

* The signature of the symbol beneath the cursor. E.g. `class
Test(a:int, b:int)` (we want something like
54f7da25f9/packages/pyright-internal/src/analyzer/typeEvaluator.ts (L21929-L22129))
* The symbols' documentation
* Do more fancy markdown rendering

I decided to defer these features for now because it requires new
semantic APIs (similar to *goto definition*), and investing in fancy
rendering only makes sense once we have the relevant data.

Closes [#16826](https://github.com/astral-sh/ruff/issues/16826)

## Test Plan



https://github.com/user-attachments/assets/044aeee4-58ad-4d4e-9e26-ac2a712026be


https://github.com/user-attachments/assets/4a1f4004-2982-4cf2-9dfd-cb8b84ff2ecb
2025-04-04 08:13:43 +02:00
Micha Reiser
2ae39edccf
[red-knot] Goto type definition (#16901)
## Summary

Implement basic *Goto type definition* support for Red Knot's LSP.

This PR also builds the foundation for other LSP operations. E.g., Goto
definition, hover, etc., should be able to reuse some, if not most,
logic introduced in this PR.

The basic steps of resolving the type definitions are:

1. Find the closest token for the cursor offset. This is a bit more
subtle than I first anticipated because the cursor could be positioned
right between the callee and the `(` in `call(test)`, in which case we
want to resolve the type for `call`.
2. Find the node with the minimal range that fully encloses the token
found in 1. I somewhat suspect that 1 and 2 could be done at the same
time but it complicated things because we also need to compute the spine
(ancestor chain) for the node and there's no guarantee that the found
nodes have the same ancestors
3. Reduce the node found in 2. to a node that is a valid goto target.
This may require traversing upwards to e.g. find the closest expression.
4. Resolve the type for the goto target
5. Resolve the location for the type, return it to the LSP

## Design decisions

The current implementation navigates to the inferred type. I think this
is what we want because it means that it correctly accounts for
narrowing (in which case we want to go to the narrowed type because
that's the value's type at the given position). However, it does have
the downside that Goto type definition doesn't work whenever we infer `T
& Unknown` because intersection types aren't supported. I'm not sure
what to do about this specific case, other than maybe ignoring `Unkown`
in Goto type definition if the type is an intersection?

## Known limitations

* Types defined in the vendored typeshed aren't supported because the
client can't open files from the red knot binary (we can either
implement our own file protocol and handler OR extract the typeshed
files and point there). See
https://github.com/astral-sh/ruff/issues/17041
* Red Knot only exposes an API to get types for expressions and
definitions. However, there are many other nodes with identifiers that
can have a type (e.g. go to type of a globals statement, match patterns,
...). We can add support for those in separate PRs (after we figure out
how to query the types from the semantic model). See
https://github.com/astral-sh/ruff/issues/17113
* We should have a higher-level API for the LSP that doesn't directly
call semantic queries. I intentionally decided not to design that API
just yet.


## Test plan


https://github.com/user-attachments/assets/fa077297-a42d-4ec8-b71f-90c0802b4edb

Goto type definition on a union

<img width="1215" alt="Screenshot 2025-04-01 at 13 02 55"
src="https://github.com/user-attachments/assets/689cabcc-4a86-4a18-b14a-c56f56868085"
/>



Note: I recorded this using a custom typeshed path so that navigating to
builtins works.
2025-04-02 12:12:48 +00:00
Brent Westbrook
2baaedda6c
[syntax-errors] Start detecting compile-time syntax errors (#16106)
## Summary

This PR implements the "greeter" approach for checking the AST for
syntax errors emitted by the CPython compiler. It introduces two main
infrastructural changes to support all of the compile-time errors:
1. Adds a new `semantic_errors` module to the parser crate with public
`SemanticSyntaxChecker` and `SemanticSyntaxError` types
2. Embeds a `SemanticSyntaxChecker` in the `ruff_linter::Checker` for
checking these errors in ruff

As a proof of concept, it also implements detection of two syntax
errors:
1. A reimplementation of
[`late-future-import`](https://docs.astral.sh/ruff/rules/late-future-import/)
(`F404`)
2. Detection of rebound comprehension iteration variables
(https://github.com/astral-sh/ruff/issues/14395)

## Test plan
Existing F404 tests, new inline tests in the `ruff_python_parser` crate,
and a linter CLI test showing an example of the `Message` output.

I also tested in VS Code, where `preview = false` and turning off syntax
errors both disable the new errors:


![image](https://github.com/user-attachments/assets/cf453d95-04f7-484b-8440-cb812f29d45e)

And on the playground, where `preview = false` also disables the errors:


![image](https://github.com/user-attachments/assets/a97570c4-1efa-439f-9d99-a54487dd6064)


Fixes #14395

---------

Co-authored-by: Micha Reiser <micha@reiser.io>
2025-03-21 14:45:25 -04:00
Brent Westbrook
37fbe58b13
Document LinterResult::has_syntax_error and add Parsed::has_no_syntax_errors (#16443)
Summary
--

This is a follow up addressing the comments on #16425. As @dhruvmanila
pointed out, the naming is a bit tricky. I went with `has_no_errors` to
try to differentiate it from `is_valid`. It actually ends up negated in
most uses, so it would be more convenient to have `has_any_errors` or
`has_errors`, but I thought it would sound too much like the opposite of
`is_valid` in that case. I'm definitely open to suggestions here.

Test Plan
--

Existing tests.
2025-03-04 08:35:38 -05:00
Brent Westbrook
78806361fd
Start detecting version-related syntax errors in the parser (#16090)
## Summary

This PR builds on the changes in #16220 to pass a target Python version
to the parser. It also adds the `Parser::unsupported_syntax_errors` field, which
collects version-related syntax errors while parsing. These syntax
errors are then turned into `Message`s in ruff (in preview mode).

This PR only detects one syntax error (`match` statement before Python
3.10), but it has been pretty quick to extend to several other simple
errors (see #16308 for example).

## Test Plan

The current tests are CLI tests in the linter crate, but these could be
supplemented with inline parser tests after #16357.

I also tested the display of these syntax errors in VS Code:


![image](https://github.com/user-attachments/assets/062b4441-740e-46c3-887c-a954049ef26e)

![image](https://github.com/user-attachments/assets/101f55b8-146c-4d59-b6b0-922f19bcd0fa)

---------

Co-authored-by: Alex Waygood <alex.waygood@gmail.com>
2025-02-25 23:03:48 -05:00
Brent Westbrook
97d0659ce3
Pass ParserOptions to the parser (#16220)
## Summary

This is part of the preparation for detecting syntax errors in the
parser from https://github.com/astral-sh/ruff/pull/16090/. As suggested
in [this
comment](https://github.com/astral-sh/ruff/pull/16090/#discussion_r1953084509),
I started working on a `ParseOptions` struct that could be stored in the
parser. For this initial refactor, I only made it hold the existing
`Mode` option, but for syntax errors, we will also need it to have a
`PythonVersion`. For that use case, I'm picturing something like a
`ParseOptions::with_python_version` method, so you can extend the
current calls to something like

```rust
ParseOptions::from(mode).with_python_version(settings.target_version)
```

But I thought it was worth adding `ParseOptions` alone without changing
any other behavior first.

Most of the diff is just updating call sites taking `Mode` to take
`ParseOptions::from(Mode)` or those taking `PySourceType`s to take
`ParseOptions::from(PySourceType)`. The interesting changes are in the
new `parser/options.rs` file and smaller parts of `parser/mod.rs` and
`ruff_python_parser/src/lib.rs`.

## Test Plan

Existing tests, this should not change any behavior.
2025-02-19 10:50:50 -05:00
Brent Westbrook
9bf138c45a
Preserve quote style in generated code (#15726)
## Summary

This is a first step toward fixing #7799 by using the quoting style
stored in the `flags` field on `ast::StringLiteral`s to select a quoting
style. This PR does not include support for f-strings or byte strings.

Several rules also needed small updates to pass along existing quoting
styles instead of using `StringLiteralFlags::default()`. The remaining
snapshot changes are intentional and should preserve the quotes from the
input strings.

## Test Plan

Existing tests with some accepted updates, plus a few new RUF055 tests
for raw strings.

---------

Co-authored-by: Alex Waygood <alex.waygood@gmail.com>
2025-01-27 13:41:03 -05:00
Shaygan Hooshyari
cf4ab7cba1
Parse triple quoted string annotations as if parenthesized (#15387)
## Summary

Resolves #9467 

Parse quoted annotations as if the string content is inside parenthesis.
With this logic `x` and `y` in this example are equal:

```python
y: """
   int |
   str
"""

z: """(
    int |
    str
)
"""
```

Also this rule only applies to triple
quotes([link](https://github.com/python/typing-council/issues/9#issuecomment-1890808610)).

This PR is based on the
[comments](https://github.com/astral-sh/ruff/issues/9467#issuecomment-2579180991)
on the issue.

I did one extra change, since we don't want any indentation tokens I am
setting the `State::Other` as the initial state of the Lexer.

Remaining work:

- [x] Add a test case for red-knot.
- [x] Add more tests.

## Test Plan

Added a test which previously failed because quoted annotation contained
indentation.
Added an mdtest for red-knot.
Updated previous test.

Co-authored-by: Dhruv Manilawala <dhruvmanila@gmail.com>
Co-authored-by: Micha Reiser <micha@reiser.io>
2025-01-16 11:38:15 +05:30
Junzhuo ZHOU
a354d9ead6
Expose internal types as public access (#13509) 2024-09-26 17:34:30 +02:00
Dhruv Manilawala
8f40928534
Enable token-based rules on source with syntax errors (#11950)
## Summary

This PR updates the linter, specifically the token-based rules, to work
on the tokens that come after a syntax error.

For context, the token-based rules only diagnose the tokens up to the
first lexical error. This PR builds up an error resilience by
introducing a `TokenIterWithContext` which updates the `nesting` level
and tries to reflect it with what the lexer is seeing. This isn't 100%
accurate because if the parser recovered from an unclosed parenthesis in
the middle of the line, the context won't reduce the nesting level until
it sees the newline token at the end of the line.

resolves: #11915

## Test Plan

* Add test cases for a bunch of rules that are affected by this change.
* Run the fuzzer for a long time, making sure to fix any other bugs.
2024-07-02 08:57:46 +00:00
Dhruv Manilawala
96da136e6a
Move token and error structs into related modules (#11957)
## Summary

This PR does some housekeeping into moving certain structs into related
modules. Specifically,
1. Move `LexicalError` from `lexer.rs` to `error.rs` which also contains
the `ParseError`
2. Move `Token`, `TokenFlags` and `TokenValue` from `lexer.rs` to
`token.rs`
2024-06-21 10:07:19 +00:00
Dhruv Manilawala
b617d90651
Update E999 to show all syntax errors (#11900)
## Summary

This PR updates the linter to show all the parse errors as diagnostics
instead of just the first one.

Note that this doesn't affect the parse error displayed as error log
message. This will be removed in a follow-up PR.

### Breaking?

I don't think this is a breaking change even though this might give more
diagnostics. The main reason is that this shouldn't affect any users
because it'll only give additional diagnostics in the case of multiple
syntax errors.

## Test Plan

Add an integration test case which would raise more than one parse
error.
2024-06-19 13:09:54 +05:30
Micha Reiser
d4dd96d1f4
red-knot: source_text, line_index, and parsed_module queries (#11822) 2024-06-13 07:37:02 +00:00
Dhruv Manilawala
549cc1e437
Build CommentRanges outside the parser (#11792)
## Summary

This PR updates the parser to remove building the `CommentRanges` and
instead it'll be built by the linter and the formatter when it's
required.

For the linter, it'll be built and owned by the `Indexer` while for the
formatter it'll be built from the `Tokens` struct and passed as an
argument.

## Test Plan

`cargo insta test`
2024-06-09 09:55:17 +00:00
Dhruv Manilawala
eed6d784df
Update type annotation parsing API to return Parsed (#11739)
## Summary

This PR updates the return type of `parse_type_annotation` from `Expr`
to `Parsed<ModExpression>`. This is to allow accessing the tokens for
the parsed sub-expression in the follow-up PR.

## Test Plan

`cargo insta test`
2024-06-05 12:59:43 +05:30
Micha Reiser
64165bee43
red-knot: Use parse_unchecked to get all parse errors (#11725) 2024-06-04 06:04:48 +00:00
Dhruv Manilawala
a58bde6958
Remove less used parser dependencies (#11718)
## Summary

This PR removes the following dependencies from the `ruff_python_parser`
crate:
* `anyhow` (moved to dev dependencies)
* `is-macro`
* `itertools`

The main motivation is that they aren't used much.

Additionally, it updates the return type of `parse_type_annotation` to
use a more specific `ParseError` instead of the generic `anyhow::Error`.

## Test Plan

`cargo insta test`
2024-06-03 13:08:24 +00:00
Dhruv Manilawala
bf5b62edac
Maintain synchronicity between the lexer and the parser (#11457)
## Summary

This PR updates the entire parser stack in multiple ways:

### Make the lexer lazy

* https://github.com/astral-sh/ruff/pull/11244
* https://github.com/astral-sh/ruff/pull/11473

Previously, Ruff's lexer would act as an iterator. The parser would
collect all the tokens in a vector first and then process the tokens to
create the syntax tree.

The first task in this project is to update the entire parsing flow to
make the lexer lazy. This includes the `Lexer`, `TokenSource`, and
`Parser`. For context, the `TokenSource` is a wrapper around the `Lexer`
to filter out the trivia tokens[^1]. Now, the parser will ask the token
source to get the next token and only then the lexer will continue and
emit the token. This means that the lexer needs to be aware of the
"current" token. When the `next_token` is called, the current token will
be updated with the newly lexed token.

The main motivation to make the lexer lazy is to allow re-lexing a token
in a different context. This is going to be really useful to make the
parser error resilience. For example, currently the emitted tokens
remains the same even if the parser can recover from an unclosed
parenthesis. This is important because the lexer emits a
`NonLogicalNewline` in parenthesized context while a normal `Newline` in
non-parenthesized context. This different kinds of newline is also used
to emit the indentation tokens which is important for the parser as it's
used to determine the start and end of a block.

Additionally, this allows us to implement the following functionalities:
1. Checkpoint - rewind infrastructure: The idea here is to create a
checkpoint and continue lexing. At a later point, this checkpoint can be
used to rewind the lexer back to the provided checkpoint.
2. Remove the `SoftKeywordTransformer` and instead use lookahead or
speculative parsing to determine whether a soft keyword is a keyword or
an identifier
3. Remove the `Tok` enum. The `Tok` enum represents the tokens emitted
by the lexer but it contains owned data which makes it expensive to
clone. The new `TokenKind` enum just represents the type of token which
is very cheap.

This brings up a question as to how will the parser get the owned value
which was stored on `Tok`. This will be solved by introducing a new
`TokenValue` enum which only contains a subset of token kinds which has
the owned value. This is stored on the lexer and is requested by the
parser when it wants to process the data. For example:
8196720f80/crates/ruff_python_parser/src/parser/expression.rs (L1260-L1262)

[^1]: Trivia tokens are `NonLogicalNewline` and `Comment`

### Remove `SoftKeywordTransformer`

* https://github.com/astral-sh/ruff/pull/11441
* https://github.com/astral-sh/ruff/pull/11459
* https://github.com/astral-sh/ruff/pull/11442
* https://github.com/astral-sh/ruff/pull/11443
* https://github.com/astral-sh/ruff/pull/11474

For context,
https://github.com/RustPython/RustPython/pull/4519/files#diff-5de40045e78e794aa5ab0b8aacf531aa477daf826d31ca129467703855408220
added support for soft keywords in the parser which uses infinite
lookahead to classify a soft keyword as a keyword or an identifier. This
is a brilliant idea as it basically wraps the existing Lexer and works
on top of it which means that the logic for lexing and re-lexing a soft
keyword remains separate. The change here is to remove
`SoftKeywordTransformer` and let the parser determine this based on
context, lookahead and speculative parsing.

* **Context:** The transformer needs to know the position of the lexer
between it being at a statement position or a simple statement position.
This is because a `match` token starts a compound statement while a
`type` token starts a simple statement. **The parser already knows
this.**
* **Lookahead:** Now that the parser knows the context it can perform
lookahead of up to two tokens to classify the soft keyword. The logic
for this is mentioned in the PR implementing it for `type` and `match
soft keyword.
* **Speculative parsing:** This is where the checkpoint - rewind
infrastructure helps. For `match` soft keyword, there are certain cases
for which we can't classify based on lookahead. The idea here is to
create a checkpoint and keep parsing. Based on whether the parsing was
successful and what tokens are ahead we can classify the remaining
cases. Refer to #11443 for more details.

If the soft keyword is being parsed in an identifier context, it'll be
converted to an identifier and the emitted token will be updated as
well. Refer
8196720f80/crates/ruff_python_parser/src/parser/expression.rs (L487-L491).

The `case` soft keyword doesn't require any special handling because
it'll be a keyword only in the context of a match statement.

### Update the parser API

* https://github.com/astral-sh/ruff/pull/11494
* https://github.com/astral-sh/ruff/pull/11505

Now that the lexer is in sync with the parser, and the parser helps to
determine whether a soft keyword is a keyword or an identifier, the
lexer cannot be used on its own. The reason being that it's not
sensitive to the context (which is correct). This means that the parser
API needs to be updated to not allow any access to the lexer.

Previously, there were multiple ways to parse the source code:
1. Passing the source code itself
2. Or, passing the tokens

Now that the lexer and parser are working together, the API
corresponding to (2) cannot exists. The final API is mentioned in this
PR description: https://github.com/astral-sh/ruff/pull/11494.

### Refactor the downstream tools (linter and formatter)

* https://github.com/astral-sh/ruff/pull/11511
* https://github.com/astral-sh/ruff/pull/11515
* https://github.com/astral-sh/ruff/pull/11529
* https://github.com/astral-sh/ruff/pull/11562
* https://github.com/astral-sh/ruff/pull/11592

And, the final set of changes involves updating all references of the
lexer and `Tok` enum. This was done in two-parts:
1. Update all the references in a way that doesn't require any changes
from this PR i.e., it can be done independently
	* https://github.com/astral-sh/ruff/pull/11402
	* https://github.com/astral-sh/ruff/pull/11406
	* https://github.com/astral-sh/ruff/pull/11418
	* https://github.com/astral-sh/ruff/pull/11419
	* https://github.com/astral-sh/ruff/pull/11420
	* https://github.com/astral-sh/ruff/pull/11424
2. Update all the remaining references to use the changes made in this
PR

For (2), there were various strategies used:
1. Introduce a new `Tokens` struct which wraps the token vector and add
methods to query a certain subset of tokens. These includes:
	1. `up_to_first_unknown` which replaces the `tokenize` function
2. `in_range` and `after` which replaces the `lex_starts_at` function
where the former returns the tokens within the given range while the
latter returns all the tokens after the given offset
2. Introduce a new `TokenFlags` which is a set of flags to query certain
information from a token. Currently, this information is only limited to
any string type token but can be expanded to include other information
in the future as needed. https://github.com/astral-sh/ruff/pull/11578
3. Move the `CommentRanges` to the parsed output because this
information is common to both the linter and the formatter. This removes
the need for `tokens_and_ranges` function.

## Test Plan

- [x] Update and verify the test snapshots
- [x] Make sure the entire test suite is passing
- [x] Make sure there are no changes in the ecosystem checks
- [x] Run the fuzzer on the parser
- [x] Run this change on dozens of open-source projects

### Running this change on dozens of open-source projects

Refer to the PR description to get the list of open source projects used
for testing.

Now, the following tests were done between `main` and this branch:
1. Compare the output of `--select=E999` (syntax errors)
2. Compare the output of default rule selection
3. Compare the output of `--select=ALL`

**Conclusion: all output were same**

## What's next?

The next step is to introduce re-lexing logic and update the parser to
feed the recovery information to the lexer so that it can emit the
correct token. This moves us one step closer to having error resilience
in the parser and provides Ruff the possibility to lint even if the
source code contains syntax errors.
2024-06-03 18:23:50 +05:30
Charlie Marsh
16acd4913f
Remove some unused pub functions (#11576)
## Summary

I left anything in `red-knot`, any `with_` methods, etc.
2024-05-28 09:56:51 -04:00
Dhruv Manilawala
025768d303
Add Tokens newtype wrapper, TokenKind iterator (#11361)
## Summary

Alternative to #11237 

This PR adds a new `Tokens` struct which is a newtype wrapper around a
vector of lexer output. This allows us to add a `kinds` method which
returns an iterator over the corresponding `TokenKind`. This iterator is
implemented as a separate `TokenKindIter` struct to allow using the type
and provide additional methods like `peek` directly on the iterator.

This exposes the linter to access the stream of `TokenKind` instead of
`Tok`.

Edit: I've made the necessary downstream changes and plan to merge the
entire stack at once.
2024-05-14 16:45:04 +00:00
Dhruv Manilawala
04a922866a
Add basic docs for the parser crate (#11199)
## Summary

This PR adds a basic README for the `ruff_python_parser` crate and
updates the CONTRIBUTING docs with the fuzzer and benchmark section.

Additionally, it also updates some inline documentation within the
parser crate and splits the `parse_program` function into
`parse_single_expression` and `parse_module` which will be called by
matching against the `Mode`.

This PR doesn't go into too much internal detail around the parser logic
due to the following reasons:
1. Where should the docs go? Should it be as a module docs in `lib.rs`
or in README?
2. The parser is still evolving and could include a lot of refactors
with the future work (feedback loop and improved error recovery and
resilience)

---------

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
2024-04-29 17:08:07 +00:00
Dhruv Manilawala
13ffb5bc19
Replace LALRPOP parser with hand-written parser (#10036)
(Supersedes #9152, authored by @LaBatata101)

## Summary

This PR replaces the current parser generated from LALRPOP to a
hand-written recursive descent parser.

It also updates the grammar for [PEP
646](https://peps.python.org/pep-0646/) so that the parser outputs the
correct AST. For example, in `data[*x]`, the index expression is now a
tuple with a single starred expression instead of just a starred
expression.

Beyond the performance improvements, the parser is also error resilient
and can provide better error messages. The behavior as seen by any
downstream tools isn't changed. That is, the linter and formatter can
still assume that the parser will _stop_ at the first syntax error. This
will be updated in the following months.

For more details about the change here, refer to the PR corresponding to
the individual commits and the release blog post.

## Test Plan

Write _lots_ and _lots_ of tests for both valid and invalid syntax and
verify the output.

## Acknowledgements

- @MichaReiser for reviewing 100+ parser PRs and continuously providing
guidance throughout the project
- @LaBatata101 for initiating the transition to a hand-written parser in
#9152
- @addisoncrump for implementing the fuzzer which helped
[catch](https://github.com/astral-sh/ruff/pull/10903)
[a](https://github.com/astral-sh/ruff/pull/10910)
[lot](https://github.com/astral-sh/ruff/pull/10966)
[of](https://github.com/astral-sh/ruff/pull/10896)
[bugs](https://github.com/astral-sh/ruff/pull/10877)

---------

Co-authored-by: Victor Hugo Gomes <labatata101@linuxmail.org>
Co-authored-by: Micha Reiser <micha@reiser.io>
2024-04-18 17:57:39 +05:30
Alex Waygood
7caf0d064a
Simplify formatting of strings by using flags from the AST nodes (#10489) 2024-03-20 16:16:54 +00:00
Alex Waygood
1d97f27335
Start tracking quoting style in the AST (#10298)
This PR modifies our AST so that nodes for string literals, bytes literals and f-strings all retain the following information:
- The quoting style used (double or single quotes)
- Whether the string is triple-quoted or not
- Whether the string is raw or not

This PR is a followup to #10256. Like with that PR, this PR does not, in itself, fix any bugs. However, it means that we will have the necessary information to preserve quoting style and rawness of strings in the `ExprGenerator` in a followup PR, which will allow us to provide a fix for https://github.com/astral-sh/ruff/issues/7799.

The information is recorded on the AST nodes using a bitflag field on each node, similarly to how we recorded the information on `Tok::String`, `Tok::FStringStart` and `Tok::FStringMiddle` tokens in #10298. Rather than reusing the bitflag I used for the tokens, however, I decided to create a custom bitflag for each AST node.

Using different bitflags for each node allows us to make invalid states unrepresentable: it is valid to set a `u` prefix on a string literal, but not on a bytes literal or an f-string. It also allows us to have better debug representations for each AST node modified in this PR.
2024-03-08 19:11:47 +00:00
Alex Waygood
c504d7ab11
Track quoting style in the tokenizer (#10256) 2024-03-08 08:40:06 +00:00
Charlie Marsh
6f0e4ad332
Remove unnecessary string cloning from the parser (#9884)
Closes https://github.com/astral-sh/ruff/issues/9869.
2024-02-09 16:03:27 -05:00
Micha Reiser
47ad7b4500
Approximate tokens len (#9546) 2024-01-19 17:39:37 +01:00
Micha Reiser
f192c72596
Remove type parameter from parse_* methods (#9466) 2024-01-11 19:41:19 +01:00
Micha Reiser
b1a5df8694
Move locate_cmp_ops to invalid_literal_comparisons (#9438) 2024-01-08 13:15:36 +01:00
Charlie Marsh
e80260a3c5
Remove source path from parser errors (#9322)
## Summary

I always found it odd that we had to pass this in, since it's really
higher-level context for the error. The awkwardness is further evidenced
by the fact that we pass in fake values everywhere (even outside of
tests). The source path isn't actually used to display the error; it's
only accessed elsewhere to _re-display_ the error in certain cases. This
PR modifies to instead pass the path directly in those cases.
2023-12-30 20:33:05 +00:00
Andrew Gallant
6a1fa4778f
Reject more syntactically invalid Python programs (#8524)
## Summary

This commit adds some additional error checking to the parser such that
assignments that are invalid syntax are rejected. This covers the
obvious cases like `5 = 3` and some not so obvious cases like `x + y =
42`.

This does add an additional recursive call to the parser for the cases
handling assignments. I had initially been concerned about doing this,
but `set_context` is already doing recursion during assignments, so I
didn't feel as though this was changing any fundamental performance
characteristics of the parser. (Also, in practice, I would expect any
such recursion here to be quite shallow since the recursion is done on
the target of an assignment. Such things are rarely nested much in
practice.)

Fixes #6895

## Test Plan

I've added unit tests covering every case that is detected as invalid on
an `Expr`.
2023-11-07 07:16:06 -05:00
Charlie Marsh
2838f7af98
Skip all bracketed expressions when locating comparison ops (#7740)
Closes https://github.com/astral-sh/ruff/issues/7737.
2023-10-01 14:57:40 +00:00
Dhruv Manilawala
e62e245c61
Add support for PEP 701 (#7376)
## Summary

This PR adds support for PEP 701 in Ruff. This is a rollup PR of all the
other individual PRs. The separate PRs were created for logic separation
and code reviews. Refer to each pull request for a detail description on
the change.

Refer to the PR description for the list of pull requests within this PR.

## Test Plan

### Formatter ecosystem checks

Explanation for the change in ecosystem check:
https://github.com/astral-sh/ruff/pull/7597#issue-1908878183

#### `main`

```
| project      | similarity index  | total files       | changed files     |
|--------------|------------------:|------------------:|------------------:|
| cpython      |           0.76083 |              1789 |              1631 |
| django       |           0.99983 |              2760 |                36 |
| transformers |           0.99963 |              2587 |               319 |
| twine        |           1.00000 |                33 |                 0 |
| typeshed     |           0.99983 |              3496 |                18 |
| warehouse    |           0.99967 |               648 |                15 |
| zulip        |           0.99972 |              1437 |                21 |
```

#### `dhruv/pep-701`

```
| project      | similarity index  | total files       | changed files     |
|--------------|------------------:|------------------:|------------------:|
| cpython      |           0.76051 |              1789 |              1632 |
| django       |           0.99983 |              2760 |                36 |
| transformers |           0.99963 |              2587 |               319 |
| twine        |           1.00000 |                33 |                 0 |
| typeshed     |           0.99983 |              3496 |                18 |
| warehouse    |           0.99967 |               648 |                15 |
| zulip        |           0.99972 |              1437 |                21 |
```
2023-09-29 02:55:39 +00:00
konsti
4d16e2308d
Formatter and parser refactoring (#7569)
I got confused and refactored a bit, now the naming should be more
consistent. This is the basis for the range formatting work.

Chages:
* `format_module` -> `format_module_source` (format a string)
* `format_node` -> `format_module_ast` (format a program parsed into an
AST)
* Added `parse_ok_tokens` that takes `Token` instead of `Result<Token>`
* Call the source code `source` consistently
* Added a `tokens_and_ranges` helper
* `python_ast` -> `module` (because that's the type)
2023-09-26 15:29:43 +02:00
Dhruv Manilawala
1adde24133
Rename parser mode from Jupyter to Ipython (#7153) 2023-09-05 14:12:26 +00:00
Charlie Marsh
fc89976c24
Move Ranged into ruff_text_size (#6919)
## Summary

The motivation here is that this enables us to implement `Ranged` in
crates that don't depend on `ruff_python_ast`.

Largely a mechanical refactor with a lot of regex, Clippy help, and
manual fixups.

## Test Plan

`cargo test`
2023-08-27 14:12:51 -04:00
Dhruv Manilawala
d1f07008f7
Rename Notebook related symbols (#6862)
This PR renames the following symbols:

* `PySourceType::Jupyter` -> `PySourceType::Ipynb`
* `SourceKind::Jupyter` -> `SourceKind::IpyNotebook`
* `JupyterIndex` -> `NotebookIndex`
2023-08-25 11:40:54 +05:30
Charlie Marsh
d08f697a04
Remove lexing for colon-matching use cases (#6803)
It's much simpler to just search ahead for the first colon.
2023-08-23 04:44:51 +00:00
Dhruv Manilawala
32fa05765a
Use Jupyter mode while parsing Notebook files (#5552)
## Summary

Enable using the new `Mode::Jupyter` for the tokenizer/parser to parse
Jupyter line magic tokens.

The individual call to the lexer i.e., `lex_starts_at` done by various
rules should consider the context of the source code (is this content
from a Jupyter Notebook?). Thus, a new field `source_type` (of type
`PySourceType`) is added to `Checker` which is being passed around as an
argument to the relevant functions. This is then used to determine the
`Mode` for the lexer.

## Test Plan

Add new test cases to make sure that the magic statement is considered
while generating the diagnostic and autofix:
* For `I001`, if there's a magic statement in between two import blocks,
they should be sorted independently

fixes: #6090
2023-08-05 00:32:07 +00:00
Micha Reiser
debfca3a11
Remove Parse trait (#6235) 2023-08-01 18:35:03 +02:00
Micha Reiser
f45e8645d7
Remove unused parser modes
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary

This PR removes the `Interactive` and `FunctionType` parser modes that are unused by ruff

<!-- What's the purpose of the change? What does it do, and why? -->

## Test Plan

`cargo test`

<!-- How was it tested? -->
2023-08-01 13:10:07 +02:00
Micha Reiser
40f54375cb
Pull in RustPython parser (#6099) 2023-07-27 09:29:11 +00:00
Micha Reiser
2cf00fee96
Remove parser dependency from ruff-python-ast (#6096) 2023-07-26 17:47:22 +02:00