mirror of
https://github.com/astral-sh/ruff.git
synced 2025-08-04 02:38:25 +00:00
Support help end escape command with priority (#6272)
## Summary This PR adds support for help end escape command in the lexer. ### What are "help end escape commands"? First, the escape commands are special IPython syntax which enhances the functionality for the IPython REPL. There are 9 types of escape kinds which are recognized by the tokens which are present at the start of the command (`?`, `??`, `!`, `!!`, etc.). Here, the help command is using either the `?` or `??` token at the start (`?str.replace` for example). Those 2 tokens are also supported when they're at the end of the command (`str.replace?`), but the other tokens aren't supported in that position. There are mainly two types of help end escape commands: 1. Ending with either `?` or `??`, but it also starts with one of the escape tokens (`%matplotlib?`) 2. On the other hand, there's a stricter version for (1) which doesn't start with any escape tokens (`str.replace?`) This PR adds support for (1) while (2) will be supported in the parser. ### Priority Now, if the command starts and ends with an escape token, how do we decide the kind of this command? This is where priority comes into picture. This is simple as there's only one priority where `?`/`??` at the end takes priority over any other escape token and all of the other tokens are at the same priority. Remember that only `?`/`??` at the end is considered valid. This is mainly useful in the case where someone would want to invoke the help command on the magic command itself. For example, in `%matplotlib?` the help command takes priority which means that we want help for the `matplotlib` magic function instead of calling the magic function itself. ### Specification Here's where things get a bit tricky. What if there are question mark tokens at both ends. How do we decide if it's `Help` (`?`) kind or `Help2` (`??`) kind? | | Magic | Value | Kind | | --- | --- | --- | --- | | 1 | `?foo?` | `foo` | `Help` | | 2 | `??foo?` | `foo` | `Help` | | 3 | `?foo??` | `foo` | `Help2` | | 4 | `??foo??` | `foo` | `Help2` | | 5 | `???foo??` | `foo` | `Help2` | | 6 | `??foo???` | `foo???` | `Help2` | | 7 | `???foo???` | `?foo???` | `Help2` | Looking at the above table: - The question mark tokens on the right takes priority over the ones on the left but only if the number of question mark on the right is 1 or 2. - If there are more than 2 question mark tokens on the right side, then the left side is used to determine the same. - If the right side is used to determine the kind, then all of the question marks and whitespaces on the left side are ignored in the `value`, but if it’s the other way around, then all of the extra question marks are part of the `value`. ### References - IPython implementation using the regex:292e3a2345/IPython/core/inputtransformer2.py (L454-L462)
- Priorities:292e3a2345/IPython/core/inputtransformer2.py (L466-L469)
## Test Plan Add a bunch of test cases for the lexer and verify that it matches the behavior of IPython transformer. resolves: #6357
This commit is contained in:
parent
b21abe0a57
commit
e4a4660925
4 changed files with 219 additions and 14 deletions
|
@ -14,7 +14,7 @@ build = "build.rs"
|
|||
[lib]
|
||||
|
||||
[dependencies]
|
||||
ruff_python_ast = { path = "../ruff_python_ast"}
|
||||
ruff_python_ast = { path = "../ruff_python_ast" }
|
||||
ruff_text_size = { path = "../ruff_text_size" }
|
||||
|
||||
anyhow = { workspace = true }
|
||||
|
|
|
@ -428,6 +428,72 @@ impl<'source> Lexer<'source> {
|
|||
self.cursor.bump();
|
||||
value.push('\\');
|
||||
}
|
||||
// Help end escape commands are those that end with 1 or 2 question marks.
|
||||
// Here, we're only looking for a subset of help end escape commands which
|
||||
// are the ones that has the escape token at the start of the line as well.
|
||||
// On the other hand, we're not looking for help end escape commands that
|
||||
// are strict in the sense that the escape token is only at the end. For example,
|
||||
//
|
||||
// * `%foo?` is recognized as a help end escape command but not as a strict one.
|
||||
// * `foo?` is recognized as a strict help end escape command which is not
|
||||
// lexed here but is identified at the parser level.
|
||||
//
|
||||
// Help end escape commands implemented in the IPython codebase using regex:
|
||||
// https://github.com/ipython/ipython/blob/292e3a23459ca965b8c1bfe2c3707044c510209a/IPython/core/inputtransformer2.py#L454-L462
|
||||
'?' => {
|
||||
self.cursor.bump();
|
||||
let mut question_count = 1u32;
|
||||
while self.cursor.eat_char('?') {
|
||||
question_count += 1;
|
||||
}
|
||||
|
||||
// The original implementation in the IPython codebase is based on regex which
|
||||
// means that it's strict in the sense that it won't recognize a help end escape:
|
||||
// * If there's any whitespace before the escape token (e.g. `%foo ?`)
|
||||
// * If there are more than 2 question mark tokens (e.g. `%foo???`)
|
||||
// which is what we're doing here as well. In that case, we'll continue with
|
||||
// the prefixed escape token.
|
||||
//
|
||||
// Now, the whitespace and empty value check also makes sure that an empty
|
||||
// command (e.g. `%?` or `? ??`, no value after/between the escape tokens)
|
||||
// is not recognized as a help end escape command. So, `%?` and `? ??` are
|
||||
// `MagicKind::Magic` and `MagicKind::Help` because of the initial `%` and `??`
|
||||
// tokens.
|
||||
if question_count > 2
|
||||
|| value.chars().last().map_or(true, is_python_whitespace)
|
||||
|| !matches!(self.cursor.first(), '\n' | '\r' | EOF_CHAR)
|
||||
{
|
||||
// Not a help end escape command, so continue with the lexing.
|
||||
value.reserve(question_count as usize);
|
||||
for _ in 0..question_count {
|
||||
value.push('?');
|
||||
}
|
||||
continue;
|
||||
}
|
||||
|
||||
if kind.is_help() {
|
||||
// If we've recognize this as a help end escape command, then
|
||||
// any question mark token / whitespaces at the start are not
|
||||
// considered as part of the value.
|
||||
//
|
||||
// For example, `??foo?` is recognized as `MagicKind::Help` and
|
||||
// `value` is `foo` instead of `??foo`.
|
||||
value = value.trim_start_matches([' ', '?']).to_string();
|
||||
} else if kind.is_magic() {
|
||||
// Between `%` and `?` (at the end), the `?` takes priority
|
||||
// over the `%` so `%foo?` is recognized as `MagicKind::Help`
|
||||
// and `value` is `%foo` instead of `foo`. So, we need to
|
||||
// insert the magic escape token at the start.
|
||||
value.insert_str(0, kind.as_str());
|
||||
}
|
||||
|
||||
let kind = match question_count {
|
||||
1 => MagicKind::Help,
|
||||
2 => MagicKind::Help2,
|
||||
_ => unreachable!("`question_count` is always 1 or 2"),
|
||||
};
|
||||
return Tok::MagicCommand { kind, value };
|
||||
}
|
||||
'\n' | '\r' | EOF_CHAR => {
|
||||
return Tok::MagicCommand { kind, value };
|
||||
}
|
||||
|
@ -1122,6 +1188,20 @@ fn is_identifier_continuation(c: char) -> bool {
|
|||
}
|
||||
}
|
||||
|
||||
/// Returns `true` for [whitespace](https://docs.python.org/3/reference/lexical_analysis.html#whitespace-between-tokens)
|
||||
/// characters.
|
||||
///
|
||||
/// This is the same as `ruff_python_trivia::is_python_whitespace` and is copied
|
||||
/// here to avoid a circular dependency as `ruff_python_trivia` has a dev-dependency
|
||||
/// on `ruff_python_lexer`.
|
||||
const fn is_python_whitespace(c: char) -> bool {
|
||||
matches!(
|
||||
c,
|
||||
// Space, tab, or form-feed
|
||||
' ' | '\t' | '\x0C'
|
||||
)
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use num_bigint::BigInt;
|
||||
|
@ -1355,6 +1435,117 @@ mod tests {
|
|||
]
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_jupyter_magic_help_end() {
|
||||
let source = r"
|
||||
?foo?
|
||||
?? foo?
|
||||
?? foo ?
|
||||
?foo??
|
||||
??foo??
|
||||
???foo?
|
||||
???foo??
|
||||
??foo???
|
||||
???foo???
|
||||
?? \
|
||||
foo?
|
||||
?? \
|
||||
?
|
||||
????
|
||||
%foo?
|
||||
%foo??
|
||||
%%foo???
|
||||
!pwd?"
|
||||
.trim();
|
||||
let tokens = lex_jupyter_source(source);
|
||||
assert_eq!(
|
||||
tokens,
|
||||
[
|
||||
Tok::MagicCommand {
|
||||
value: "foo".to_string(),
|
||||
kind: MagicKind::Help,
|
||||
},
|
||||
Tok::Newline,
|
||||
Tok::MagicCommand {
|
||||
value: "foo".to_string(),
|
||||
kind: MagicKind::Help,
|
||||
},
|
||||
Tok::Newline,
|
||||
Tok::MagicCommand {
|
||||
value: " foo ?".to_string(),
|
||||
kind: MagicKind::Help2,
|
||||
},
|
||||
Tok::Newline,
|
||||
Tok::MagicCommand {
|
||||
value: "foo".to_string(),
|
||||
kind: MagicKind::Help2,
|
||||
},
|
||||
Tok::Newline,
|
||||
Tok::MagicCommand {
|
||||
value: "foo".to_string(),
|
||||
kind: MagicKind::Help2,
|
||||
},
|
||||
Tok::Newline,
|
||||
Tok::MagicCommand {
|
||||
value: "foo".to_string(),
|
||||
kind: MagicKind::Help,
|
||||
},
|
||||
Tok::Newline,
|
||||
Tok::MagicCommand {
|
||||
value: "foo".to_string(),
|
||||
kind: MagicKind::Help2,
|
||||
},
|
||||
Tok::Newline,
|
||||
Tok::MagicCommand {
|
||||
value: "foo???".to_string(),
|
||||
kind: MagicKind::Help2,
|
||||
},
|
||||
Tok::Newline,
|
||||
Tok::MagicCommand {
|
||||
value: "?foo???".to_string(),
|
||||
kind: MagicKind::Help2,
|
||||
},
|
||||
Tok::Newline,
|
||||
Tok::MagicCommand {
|
||||
value: "foo".to_string(),
|
||||
kind: MagicKind::Help,
|
||||
},
|
||||
Tok::Newline,
|
||||
Tok::MagicCommand {
|
||||
value: " ?".to_string(),
|
||||
kind: MagicKind::Help2,
|
||||
},
|
||||
Tok::Newline,
|
||||
Tok::MagicCommand {
|
||||
value: "??".to_string(),
|
||||
kind: MagicKind::Help2,
|
||||
},
|
||||
Tok::Newline,
|
||||
Tok::MagicCommand {
|
||||
value: "%foo".to_string(),
|
||||
kind: MagicKind::Help,
|
||||
},
|
||||
Tok::Newline,
|
||||
Tok::MagicCommand {
|
||||
value: "%foo".to_string(),
|
||||
kind: MagicKind::Help2,
|
||||
},
|
||||
Tok::Newline,
|
||||
Tok::MagicCommand {
|
||||
value: "foo???".to_string(),
|
||||
kind: MagicKind::Magic2,
|
||||
},
|
||||
Tok::Newline,
|
||||
Tok::MagicCommand {
|
||||
value: "pwd".to_string(),
|
||||
kind: MagicKind::Help,
|
||||
},
|
||||
Tok::Newline,
|
||||
]
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_jupyter_magic_indentation() {
|
||||
let source = r"
|
||||
|
|
|
@ -49,14 +49,14 @@ Module(
|
|||
StmtLineMagic {
|
||||
range: 81..88,
|
||||
kind: Help,
|
||||
value: "a.foo?",
|
||||
value: "a.foo",
|
||||
},
|
||||
),
|
||||
LineMagic(
|
||||
StmtLineMagic {
|
||||
range: 89..100,
|
||||
kind: Help2,
|
||||
value: "a.foo()??",
|
||||
value: "a.foo()",
|
||||
},
|
||||
),
|
||||
LineMagic(
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue