Commit graph

15 commits

Author SHA1 Message Date
Dhruv Manilawala
38d2562f41
Refactor unary expression parsing (#11088)
## Summary

This PR refactors unary expression parsing with the following changes:
* Ability to get `OperatorPrecedence` from a unary operator (`UnaryOp`)
* Implement methods on `TokenKind`
	* Add `as_unary_operator` which returns an `Option<UnaryOp>`
* Add `as_unary_arithmetic_operator` which returns an `Option<UnaryOp>`
(used for pattern parsing)
* Rename `is_unary` to `is_unary_arithmetic_operator` (used in the
linter)

resolves: #10752 

## Test Plan

Verify that the existing test cases pass, no ecosystem changes, run the
Python based fuzzer on 3000 random inputs and run it on dozens of
open-source repositories.
2024-04-23 04:55:02 +00:00
Dhruv Manilawala
7eba967e16
Refactor binary expression parsing (#11073)
## Summary

This PR refactors the binary expression parsing in a way to make it
readable and easy to understand. It draws inspiration from the suggested
edits in the linked messages in #10752.

### Changes

* Ability to get the precedence of an operator
	* From a boolean operator (`BinOp`) to `OperatorPrecedence`
	* From a binary operator (`Operator`) to `OperatorPrecedence`
	* No comparison operator because all of them have the same precedence
* Implement methods on `TokenKind` to convert it to an appropriate
operator enum
	* Add `as_boolean_operator` which returns an `Option<BoolOp>`
	* Add `as_binary_operator` which returns an `Option<Operator>`
* No `as_comparison_operator` because it requires lookahead and I'm not
sure if `token.as_comparison_operator(peek)` is a good way to implement
it
* Introduce `BinaryLikeOperator`
	* Constructed from two tokens using the methods from the second point
* Add `precedence` method using the conversion methods mentioned in the
first point
* Make most of the functions in `TokenKind` private to the module
* Use `self` instead of `&self` for `TokenKind` 

fixes: #11072

## Test Plan

Refer #11088
2024-04-23 04:42:40 +00:00
Dhruv Manilawala
13ffb5bc19
Replace LALRPOP parser with hand-written parser (#10036)
(Supersedes #9152, authored by @LaBatata101)

## Summary

This PR replaces the current parser generated from LALRPOP to a
hand-written recursive descent parser.

It also updates the grammar for [PEP
646](https://peps.python.org/pep-0646/) so that the parser outputs the
correct AST. For example, in `data[*x]`, the index expression is now a
tuple with a single starred expression instead of just a starred
expression.

Beyond the performance improvements, the parser is also error resilient
and can provide better error messages. The behavior as seen by any
downstream tools isn't changed. That is, the linter and formatter can
still assume that the parser will _stop_ at the first syntax error. This
will be updated in the following months.

For more details about the change here, refer to the PR corresponding to
the individual commits and the release blog post.

## Test Plan

Write _lots_ and _lots_ of tests for both valid and invalid syntax and
verify the output.

## Acknowledgements

- @MichaReiser for reviewing 100+ parser PRs and continuously providing
guidance throughout the project
- @LaBatata101 for initiating the transition to a hand-written parser in
#9152
- @addisoncrump for implementing the fuzzer which helped
[catch](https://github.com/astral-sh/ruff/pull/10903)
[a](https://github.com/astral-sh/ruff/pull/10910)
[lot](https://github.com/astral-sh/ruff/pull/10966)
[of](https://github.com/astral-sh/ruff/pull/10896)
[bugs](https://github.com/astral-sh/ruff/pull/10877)

---------

Co-authored-by: Victor Hugo Gomes <labatata101@linuxmail.org>
Co-authored-by: Micha Reiser <micha@reiser.io>
2024-04-18 17:57:39 +05:30
Alex Waygood
7caf0d064a
Simplify formatting of strings by using flags from the AST nodes (#10489) 2024-03-20 16:16:54 +00:00
Alex Waygood
92e6026446
Apply NFKC normalization to unicode identifiers in the lexer (#10412) 2024-03-18 11:56:56 +00:00
Alex Waygood
c504d7ab11
Track quoting style in the tokenizer (#10256) 2024-03-08 08:40:06 +00:00
Micha Reiser
fe7d965334
Reduce Result<Tok, LexicalError> size by using Box<str> instead of String (#9885) 2024-02-08 20:36:22 +00:00
Seo Sanghyeon
df7fb95cbc
Index multiline f-strings (#9837)
Fix #9777.
2024-02-05 21:25:33 -05:00
Dhruv Manilawala
e62e245c61
Add support for PEP 701 (#7376)
## Summary

This PR adds support for PEP 701 in Ruff. This is a rollup PR of all the
other individual PRs. The separate PRs were created for logic separation
and code reviews. Refer to each pull request for a detail description on
the change.

Refer to the PR description for the list of pull requests within this PR.

## Test Plan

### Formatter ecosystem checks

Explanation for the change in ecosystem check:
https://github.com/astral-sh/ruff/pull/7597#issue-1908878183

#### `main`

```
| project      | similarity index  | total files       | changed files     |
|--------------|------------------:|------------------:|------------------:|
| cpython      |           0.76083 |              1789 |              1631 |
| django       |           0.99983 |              2760 |                36 |
| transformers |           0.99963 |              2587 |               319 |
| twine        |           1.00000 |                33 |                 0 |
| typeshed     |           0.99983 |              3496 |                18 |
| warehouse    |           0.99967 |               648 |                15 |
| zulip        |           0.99972 |              1437 |                21 |
```

#### `dhruv/pep-701`

```
| project      | similarity index  | total files       | changed files     |
|--------------|------------------:|------------------:|------------------:|
| cpython      |           0.76051 |              1789 |              1632 |
| django       |           0.99983 |              2760 |                36 |
| transformers |           0.99963 |              2587 |               319 |
| twine        |           1.00000 |                33 |                 0 |
| typeshed     |           0.99983 |              3496 |                18 |
| warehouse    |           0.99967 |               648 |                15 |
| zulip        |           0.99972 |              1437 |                21 |
```
2023-09-29 02:55:39 +00:00
Charlie Marsh
93b5d8a0fb
Implement our own small-integer optimization (#7584)
## Summary

This is a follow-up to #7469 that attempts to achieve similar gains, but
without introducing malachite. Instead, this PR removes the `BigInt`
type altogether, instead opting for a simple enum that allows us to
store small integers directly and only allocate for values greater than
`i64`:

```rust
/// A Python integer literal. Represents both small (fits in an `i64`) and large integers.
#[derive(Clone, PartialEq, Eq, Hash)]
pub struct Int(Number);

#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum Number {
    /// A "small" number that can be represented as an `i64`.
    Small(i64),
    /// A "large" number that cannot be represented as an `i64`.
    Big(Box<str>),
}

impl std::fmt::Display for Number {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match self {
            Number::Small(value) => write!(f, "{value}"),
            Number::Big(value) => write!(f, "{value}"),
        }
    }
}
```

We typically don't care about numbers greater than `isize` -- our only
uses are comparisons against small constants (like `1`, `2`, `3`, etc.),
so there's no real loss of information, except in one or two rules where
we're now a little more conservative (with the worst-case being that we
don't flag, e.g., an `itertools.pairwise` that uses an extremely large
value for the slice start constant). For simplicity, a few diagnostics
now show a dedicated message when they see integers that are out of the
supported range (e.g., `outdated-version-block`).

An additional benefit here is that we get to remove a few dependencies,
especially `num-bigint`.

## Test Plan

`cargo test`
2023-09-25 15:13:21 +00:00
Dhruv Manilawala
1adde24133
Rename parser mode from Jupyter to Ipython (#7153) 2023-09-05 14:12:26 +00:00
Dhruv Manilawala
6a64f2289b
Rename Magic* to IpyEscape* (#6395)
## Summary

This PR renames the `MagicCommand` token to `IpyEscapeCommand` token and
`MagicKind` to `IpyEscapeKind` type to better reflect the purpose of the
token and type. Similarly, it renames the AST nodes from `LineMagic` to
`IpyEscapeCommand` prefixed with `Stmt`/`Expr` wherever necessary.

It also makes renames from using `jupyter_magic` to
`ipython_escape_commands` in various function names.

The mode value is still `Mode::Jupyter` because the escape commands are
part of the IPython syntax but the lexing/parsing is done for a Jupyter
notebook.

### Motivation behind the rename:
* IPython codebase defines it as "EscapeCommand" / "Escape Sequences":
* Escape Sequences:
292e3a2345/IPython/core/inputtransformer2.py (L329-L333)
* Escape command:
292e3a2345/IPython/core/inputtransformer2.py (L410-L411)
* The word "magic" is used mainly for the actual magic commands i.e.,
the ones starting with `%`/`%%`
(https://ipython.readthedocs.io/en/stable/interactive/reference.html#magic-command-system).
So, this avoids any confusion between the Magic token (`%`, `%%`) and
the escape command itself.
## Test Plan

* `cargo test` to make sure all renames are done correctly.
* `grep` for `jupyter_escape`/`magic` to make sure all renames are done
correctly.
2023-08-09 13:28:18 +00:00
Dhruv Manilawala
e257c5af32
Add support for help end IPython escape commands (#6358)
## Summary

This PR adds support for a stricter version of help end escape
commands[^1] in the parser. By stricter, I mean that the escape tokens
are only at the end of the command and there are no tokens at the start.
This makes it difficult to implement it in the lexer without having to
do a lot of look aheads or keeping track of previous tokens.

Now, as we're adding this in the parser, the lexer needs to recognize
and emit a new token for `?`. So, `Question` token is added which will
be recognized only in `Jupyter` mode.

The conditions applied are the same as the ones in the original
implementation in IPython codebase (which is a regex):
* There can only be either 1 or 2 question mark(s) at the end
* The node before the question mark can be a `Name`, `Attribute`,
`Subscript` (only with integer constants in slice position), or any
combination of the 3 nodes.

## Test Plan

Added test cases for various combination of the possible nodes in the
command value position and update the snapshots.

fixes: #6359
fixes: #5030 (This is the final piece)

[^1]: https://github.com/astral-sh/ruff/pull/6272#issue-1833094281
2023-08-09 10:28:52 +05:30
Micha Reiser
f45e8645d7
Remove unused parser modes
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary

This PR removes the `Interactive` and `FunctionType` parser modes that are unused by ruff

<!-- What's the purpose of the change? What does it do, and why? -->

## Test Plan

`cargo test`

<!-- How was it tested? -->
2023-08-01 13:10:07 +02:00
Micha Reiser
40f54375cb
Pull in RustPython parser (#6099) 2023-07-27 09:29:11 +00:00