## Summary
This PR updates the lexer test snapshots to include the range value as
well. This is mainly a mechanical refactor.
### Motivation
The main motivation is so that we can verify that the ranges are valid
and do not overlap.
## Test Plan
`cargo test`
## Summary
This PR updates the remaining lexer test cases to use the snapshots.
This is mainly a mechanical refactor.
## Motivation
The main motivation is so that when we add the token range values to the
test case output, it's easier to update the test cases.
The reason they were not using the snapshots before was because of the usage of
`test_case` macro. The macros is mainly used for different EOL test cases. If we
just generate the snapshots directly, then the snapshot name would be suffixed
with `-1`, `-2`, etc. as the test function is still the same. So, we'll create
the snapshot ourselves with the platform name for the respective EOL
test cases.
## Test Plan
`cargo test`
## Summary
This PR adds a new helper method on the `Cursor` called `eat_char2`
which is similar to `eat_char` but accepts 2 characters instead of 1. It'll
`bump` the cursor twice if both characters are found on lookahead.
## Test Plan
`cargo test`
## Summary
This PR fixes a bug which sends the lexer into infinite loop for an invalid input.
The code in question is `[1` where the nesting is never finished. This means
that the lexer will keep emitting the `Err` token forever.
## Test Plan
Add a test case which collects all the tokens from the lexer. This just
makes sure that it doesn't go into infinite loop.
## Summary
This PR updates the lexer tests to use the snapshot testing framework.
It also
makes the following changes:
* Remove the use of macros in the lexer tests
* Use `test_case` for EOL tests
## Test Plan
```
cargo test --package ruff_python_parser --lib --all-features -- lexer::tests --no-capture
```
## Summary
This PR renames the `MagicCommand` token to `IpyEscapeCommand` token and
`MagicKind` to `IpyEscapeKind` type to better reflect the purpose of the
token and type. Similarly, it renames the AST nodes from `LineMagic` to
`IpyEscapeCommand` prefixed with `Stmt`/`Expr` wherever necessary.
It also makes renames from using `jupyter_magic` to
`ipython_escape_commands` in various function names.
The mode value is still `Mode::Jupyter` because the escape commands are
part of the IPython syntax but the lexing/parsing is done for a Jupyter
notebook.
### Motivation behind the rename:
* IPython codebase defines it as "EscapeCommand" / "Escape Sequences":
* Escape Sequences:
292e3a2345/IPython/core/inputtransformer2.py (L329-L333)
* Escape command:
292e3a2345/IPython/core/inputtransformer2.py (L410-L411)
* The word "magic" is used mainly for the actual magic commands i.e.,
the ones starting with `%`/`%%`
(https://ipython.readthedocs.io/en/stable/interactive/reference.html#magic-command-system).
So, this avoids any confusion between the Magic token (`%`, `%%`) and
the escape command itself.
## Test Plan
* `cargo test` to make sure all renames are done correctly.
* `grep` for `jupyter_escape`/`magic` to make sure all renames are done
correctly.
## Summary
This PR adds support for a stricter version of help end escape
commands[^1] in the parser. By stricter, I mean that the escape tokens
are only at the end of the command and there are no tokens at the start.
This makes it difficult to implement it in the lexer without having to
do a lot of look aheads or keeping track of previous tokens.
Now, as we're adding this in the parser, the lexer needs to recognize
and emit a new token for `?`. So, `Question` token is added which will
be recognized only in `Jupyter` mode.
The conditions applied are the same as the ones in the original
implementation in IPython codebase (which is a regex):
* There can only be either 1 or 2 question mark(s) at the end
* The node before the question mark can be a `Name`, `Attribute`,
`Subscript` (only with integer constants in slice position), or any
combination of the 3 nodes.
## Test Plan
Added test cases for various combination of the possible nodes in the
command value position and update the snapshots.
fixes: #6359fixes: #5030 (This is the final piece)
[^1]: https://github.com/astral-sh/ruff/pull/6272#issue-1833094281
## Summary
This PR adds support for help end escape command in the lexer.
### What are "help end escape commands"?
First, the escape commands are special IPython syntax which enhances the
functionality for the IPython REPL. There are 9 types of escape kinds
which are recognized by the tokens which are present at the start of the
command (`?`, `??`, `!`, `!!`, etc.).
Here, the help command is using either the `?` or `??` token at the
start (`?str.replace` for example). Those 2 tokens are also supported
when they're at the end of the command (`str.replace?`), but the other
tokens aren't supported in that position.
There are mainly two types of help end escape commands:
1. Ending with either `?` or `??`, but it also starts with one of the
escape tokens (`%matplotlib?`)
2. On the other hand, there's a stricter version for (1) which doesn't
start with any escape tokens (`str.replace?`)
This PR adds support for (1) while (2) will be supported in the parser.
### Priority
Now, if the command starts and ends with an escape token, how do we
decide the kind of this command? This is where priority comes into
picture. This is simple as there's only one priority where `?`/`??` at
the end takes priority over any other escape token and all of the other
tokens are at the same priority. Remember that only `?`/`??` at the end
is considered valid.
This is mainly useful in the case where someone would want to invoke the
help command on the magic command itself. For example, in `%matplotlib?`
the help command takes priority which means that we want help for the
`matplotlib` magic function instead of calling the magic function
itself.
### Specification
Here's where things get a bit tricky. What if there are question mark
tokens at both ends. How do we decide if it's `Help` (`?`) kind or
`Help2` (`??`) kind?
| | Magic | Value | Kind |
| --- | --- | --- | --- |
| 1 | `?foo?` | `foo` | `Help` |
| 2 | `??foo?` | `foo` | `Help` |
| 3 | `?foo??` | `foo` | `Help2` |
| 4 | `??foo??` | `foo` | `Help2` |
| 5 | `???foo??` | `foo` | `Help2` |
| 6 | `??foo???` | `foo???` | `Help2` |
| 7 | `???foo???` | `?foo???` | `Help2` |
Looking at the above table:
- The question mark tokens on the right takes priority over the ones on
the left but only if the number of question mark on the right is 1 or 2.
- If there are more than 2 question mark tokens on the right side, then
the left side is used to determine the same.
- If the right side is used to determine the kind, then all of the
question marks and whitespaces on the left side are ignored in the
`value`, but if it’s the other way around, then all of the extra
question marks are part of the `value`.
### References
- IPython implementation using the regex:
292e3a2345/IPython/core/inputtransformer2.py (L454-L462)
- Priorities:
292e3a2345/IPython/core/inputtransformer2.py (L466-L469)
## Test Plan
Add a bunch of test cases for the lexer and verify that it matches the
behavior of
IPython transformer.
resolves: #6357