Maintain synchronicity between the lexer and the parser (#11457)

## Summary

This PR updates the entire parser stack in multiple ways:

### Make the lexer lazy

* https://github.com/astral-sh/ruff/pull/11244
* https://github.com/astral-sh/ruff/pull/11473

Previously, Ruff's lexer would act as an iterator. The parser would
collect all the tokens in a vector first and then process the tokens to
create the syntax tree.

The first task in this project is to update the entire parsing flow to
make the lexer lazy. This includes the `Lexer`, `TokenSource`, and
`Parser`. For context, the `TokenSource` is a wrapper around the `Lexer`
to filter out the trivia tokens[^1]. Now, the parser will ask the token
source to get the next token and only then the lexer will continue and
emit the token. This means that the lexer needs to be aware of the
"current" token. When the `next_token` is called, the current token will
be updated with the newly lexed token.

The main motivation to make the lexer lazy is to allow re-lexing a token
in a different context. This is going to be really useful to make the
parser error resilience. For example, currently the emitted tokens
remains the same even if the parser can recover from an unclosed
parenthesis. This is important because the lexer emits a
`NonLogicalNewline` in parenthesized context while a normal `Newline` in
non-parenthesized context. This different kinds of newline is also used
to emit the indentation tokens which is important for the parser as it's
used to determine the start and end of a block.

Additionally, this allows us to implement the following functionalities:
1. Checkpoint - rewind infrastructure: The idea here is to create a
checkpoint and continue lexing. At a later point, this checkpoint can be
used to rewind the lexer back to the provided checkpoint.
2. Remove the `SoftKeywordTransformer` and instead use lookahead or
speculative parsing to determine whether a soft keyword is a keyword or
an identifier
3. Remove the `Tok` enum. The `Tok` enum represents the tokens emitted
by the lexer but it contains owned data which makes it expensive to
clone. The new `TokenKind` enum just represents the type of token which
is very cheap.

This brings up a question as to how will the parser get the owned value
which was stored on `Tok`. This will be solved by introducing a new
`TokenValue` enum which only contains a subset of token kinds which has
the owned value. This is stored on the lexer and is requested by the
parser when it wants to process the data. For example:
8196720f80/crates/ruff_python_parser/src/parser/expression.rs (L1260-L1262)

[^1]: Trivia tokens are `NonLogicalNewline` and `Comment`

### Remove `SoftKeywordTransformer`

* https://github.com/astral-sh/ruff/pull/11441
* https://github.com/astral-sh/ruff/pull/11459
* https://github.com/astral-sh/ruff/pull/11442
* https://github.com/astral-sh/ruff/pull/11443
* https://github.com/astral-sh/ruff/pull/11474

For context,
https://github.com/RustPython/RustPython/pull/4519/files#diff-5de40045e78e794aa5ab0b8aacf531aa477daf826d31ca129467703855408220
added support for soft keywords in the parser which uses infinite
lookahead to classify a soft keyword as a keyword or an identifier. This
is a brilliant idea as it basically wraps the existing Lexer and works
on top of it which means that the logic for lexing and re-lexing a soft
keyword remains separate. The change here is to remove
`SoftKeywordTransformer` and let the parser determine this based on
context, lookahead and speculative parsing.

* **Context:** The transformer needs to know the position of the lexer
between it being at a statement position or a simple statement position.
This is because a `match` token starts a compound statement while a
`type` token starts a simple statement. **The parser already knows
this.**
* **Lookahead:** Now that the parser knows the context it can perform
lookahead of up to two tokens to classify the soft keyword. The logic
for this is mentioned in the PR implementing it for `type` and `match
soft keyword.
* **Speculative parsing:** This is where the checkpoint - rewind
infrastructure helps. For `match` soft keyword, there are certain cases
for which we can't classify based on lookahead. The idea here is to
create a checkpoint and keep parsing. Based on whether the parsing was
successful and what tokens are ahead we can classify the remaining
cases. Refer to #11443 for more details.

If the soft keyword is being parsed in an identifier context, it'll be
converted to an identifier and the emitted token will be updated as
well. Refer
8196720f80/crates/ruff_python_parser/src/parser/expression.rs (L487-L491).

The `case` soft keyword doesn't require any special handling because
it'll be a keyword only in the context of a match statement.

### Update the parser API

* https://github.com/astral-sh/ruff/pull/11494
* https://github.com/astral-sh/ruff/pull/11505

Now that the lexer is in sync with the parser, and the parser helps to
determine whether a soft keyword is a keyword or an identifier, the
lexer cannot be used on its own. The reason being that it's not
sensitive to the context (which is correct). This means that the parser
API needs to be updated to not allow any access to the lexer.

Previously, there were multiple ways to parse the source code:
1. Passing the source code itself
2. Or, passing the tokens

Now that the lexer and parser are working together, the API
corresponding to (2) cannot exists. The final API is mentioned in this
PR description: https://github.com/astral-sh/ruff/pull/11494.

### Refactor the downstream tools (linter and formatter)

* https://github.com/astral-sh/ruff/pull/11511
* https://github.com/astral-sh/ruff/pull/11515
* https://github.com/astral-sh/ruff/pull/11529
* https://github.com/astral-sh/ruff/pull/11562
* https://github.com/astral-sh/ruff/pull/11592

And, the final set of changes involves updating all references of the
lexer and `Tok` enum. This was done in two-parts:
1. Update all the references in a way that doesn't require any changes
from this PR i.e., it can be done independently
	* https://github.com/astral-sh/ruff/pull/11402
	* https://github.com/astral-sh/ruff/pull/11406
	* https://github.com/astral-sh/ruff/pull/11418
	* https://github.com/astral-sh/ruff/pull/11419
	* https://github.com/astral-sh/ruff/pull/11420
	* https://github.com/astral-sh/ruff/pull/11424
2. Update all the remaining references to use the changes made in this
PR

For (2), there were various strategies used:
1. Introduce a new `Tokens` struct which wraps the token vector and add
methods to query a certain subset of tokens. These includes:
	1. `up_to_first_unknown` which replaces the `tokenize` function
2. `in_range` and `after` which replaces the `lex_starts_at` function
where the former returns the tokens within the given range while the
latter returns all the tokens after the given offset
2. Introduce a new `TokenFlags` which is a set of flags to query certain
information from a token. Currently, this information is only limited to
any string type token but can be expanded to include other information
in the future as needed. https://github.com/astral-sh/ruff/pull/11578
3. Move the `CommentRanges` to the parsed output because this
information is common to both the linter and the formatter. This removes
the need for `tokens_and_ranges` function.

## Test Plan

- [x] Update and verify the test snapshots
- [x] Make sure the entire test suite is passing
- [x] Make sure there are no changes in the ecosystem checks
- [x] Run the fuzzer on the parser
- [x] Run this change on dozens of open-source projects

### Running this change on dozens of open-source projects

Refer to the PR description to get the list of open source projects used
for testing.

Now, the following tests were done between `main` and this branch:
1. Compare the output of `--select=E999` (syntax errors)
2. Compare the output of default rule selection
3. Compare the output of `--select=ALL`

**Conclusion: all output were same**

## What's next?

The next step is to introduce re-lexing logic and update the parser to
feed the recovery information to the lexer so that it can emit the
correct token. This moves us one step closer to having error resilience
in the parser and provides Ruff the possibility to lint even if the
source code contains syntax errors.
This commit is contained in:
Dhruv Manilawala 2024-06-03 18:23:50 +05:30 committed by GitHub
parent c69a789aa5
commit bf5b62edac
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
262 changed files with 8174 additions and 6132 deletions

7
Cargo.lock generated
View file

@ -1960,7 +1960,6 @@ dependencies = [
"ruff_linter",
"ruff_python_ast",
"ruff_python_formatter",
"ruff_python_index",
"ruff_python_parser",
"serde",
"serde_json",
@ -2008,6 +2007,7 @@ dependencies = [
"ruff_python_parser",
"ruff_python_stdlib",
"ruff_python_trivia",
"ruff_text_size",
"ruff_workspace",
"schemars",
"serde",
@ -2184,6 +2184,7 @@ dependencies = [
"ruff_python_literal",
"ruff_python_parser",
"ruff_source_file",
"ruff_text_size",
]
[[package]]
@ -2202,7 +2203,6 @@ dependencies = [
"ruff_formatter",
"ruff_macros",
"ruff_python_ast",
"ruff_python_index",
"ruff_python_parser",
"ruff_python_trivia",
"ruff_source_file",
@ -2253,6 +2253,7 @@ dependencies = [
"itertools 0.13.0",
"memchr",
"ruff_python_ast",
"ruff_python_trivia",
"ruff_source_file",
"ruff_text_size",
"rustc-hash",
@ -2310,7 +2311,6 @@ name = "ruff_python_trivia_integration_tests"
version = "0.0.0"
dependencies = [
"insta",
"ruff_python_index",
"ruff_python_parser",
"ruff_python_trivia",
"ruff_source_file",
@ -2385,7 +2385,6 @@ dependencies = [
"ruff_python_formatter",
"ruff_python_index",
"ruff_python_parser",
"ruff_python_trivia",
"ruff_source_file",
"ruff_text_size",
"ruff_workspace",

View file

@ -32,8 +32,9 @@ impl Parsed {
let result = ruff_python_parser::parse(text, Mode::Module);
let (module, errors) = match result {
Ok(ast::Mod::Module(module)) => (module, vec![]),
Ok(ast::Mod::Expression(expression)) => (
Ok(parsed) => match parsed.into_syntax() {
ast::Mod::Module(module) => (module, vec![]),
ast::Mod::Expression(expression) => (
ast::ModModule {
range: expression.range(),
body: vec![ast::Stmt::Expr(ast::StmtExpr {
@ -43,6 +44,7 @@ impl Parsed {
},
vec![],
),
},
Err(errors) => (
ast::ModModule {
range: TextRange::default(),

View file

@ -44,7 +44,6 @@ codspeed-criterion-compat = { workspace = true, default-features = false, option
ruff_linter = { workspace = true }
ruff_python_ast = { workspace = true }
ruff_python_formatter = { workspace = true }
ruff_python_index = { workspace = true }
ruff_python_parser = { workspace = true }
[lints]

View file

@ -5,9 +5,7 @@ use ruff_benchmark::criterion::{
};
use ruff_benchmark::{TestCase, TestFile, TestFileDownloadError};
use ruff_python_formatter::{format_module_ast, PreviewMode, PyFormatOptions};
use ruff_python_index::CommentRangesBuilder;
use ruff_python_parser::lexer::lex;
use ruff_python_parser::{allocate_tokens_vec, parse_tokens, Mode};
use ruff_python_parser::{parse, Mode};
#[cfg(target_os = "windows")]
#[global_allocator]
@ -52,27 +50,14 @@ fn benchmark_formatter(criterion: &mut Criterion) {
BenchmarkId::from_parameter(case.name()),
&case,
|b, case| {
let mut tokens = allocate_tokens_vec(case.code());
let mut comment_ranges = CommentRangesBuilder::default();
for result in lex(case.code(), Mode::Module) {
let (token, range) = result.expect("Input to be a valid python program.");
comment_ranges.visit_token(&token, range);
tokens.push(Ok((token, range)));
}
let comment_ranges = comment_ranges.finish();
// Parse the AST.
let module = parse_tokens(tokens, case.code(), Mode::Module)
.expect("Input to be a valid python program");
// Parse the source.
let parsed =
parse(case.code(), Mode::Module).expect("Input should be a valid Python code");
b.iter(|| {
let options = PyFormatOptions::from_extension(Path::new(case.name()))
.with_preview(PreviewMode::Enabled);
let formatted =
format_module_ast(&module, &comment_ranges, case.code(), options)
let formatted = format_module_ast(&parsed, case.code(), options)
.expect("Formatting to succeed");
formatted.print().expect("Printing to succeed")

View file

@ -2,7 +2,7 @@ use ruff_benchmark::criterion::{
criterion_group, criterion_main, measurement::WallTime, BenchmarkId, Criterion, Throughput,
};
use ruff_benchmark::{TestCase, TestFile, TestFileDownloadError};
use ruff_python_parser::{lexer, Mode};
use ruff_python_parser::{lexer, Mode, TokenKind};
#[cfg(target_os = "windows")]
#[global_allocator]
@ -47,9 +47,15 @@ fn benchmark_lexer(criterion: &mut Criterion<WallTime>) {
&case,
|b, case| {
b.iter(|| {
let result =
lexer::lex(case.code(), Mode::Module).find(std::result::Result::is_err);
assert_eq!(result, None, "Input to be a valid Python program");
let mut lexer = lexer::lex(case.code(), Mode::Module);
loop {
let token = lexer.next_token();
match token {
TokenKind::EndOfFile => break,
TokenKind::Unknown => panic!("Input to be a valid Python source code"),
_ => {}
}
}
});
},
);

View file

@ -10,7 +10,7 @@ use ruff_linter::settings::{flags, LinterSettings};
use ruff_linter::source_kind::SourceKind;
use ruff_linter::{registry::Rule, RuleSelector};
use ruff_python_ast::PySourceType;
use ruff_python_parser::{parse_program_tokens, tokenize, Mode};
use ruff_python_parser::parse_module;
#[cfg(target_os = "windows")]
#[global_allocator]
@ -54,15 +54,13 @@ fn benchmark_linter(mut group: BenchmarkGroup, settings: &LinterSettings) {
BenchmarkId::from_parameter(case.name()),
&case,
|b, case| {
// Tokenize the source.
let tokens = tokenize(case.code(), Mode::Module);
// Parse the source.
let ast = parse_program_tokens(tokens.clone(), case.code(), false).unwrap();
let parsed =
parse_module(case.code()).expect("Input should be a valid Python code");
b.iter_batched(
|| (ast.clone(), tokens.clone()),
|(ast, tokens)| {
|| parsed.clone(),
|parsed| {
let path = case.path();
let result = lint_only(
&path,
@ -71,7 +69,7 @@ fn benchmark_linter(mut group: BenchmarkGroup, settings: &LinterSettings) {
flags::Noqa::Enabled,
&SourceKind::Python(case.code().to_string()),
PySourceType::from(path.as_path()),
ParseSource::Precomputed { tokens, ast },
ParseSource::Precomputed(parsed),
);
// Assert that file contains no parse errors

View file

@ -4,7 +4,7 @@ use ruff_benchmark::criterion::{
use ruff_benchmark::{TestCase, TestFile, TestFileDownloadError};
use ruff_python_ast::statement_visitor::{walk_stmt, StatementVisitor};
use ruff_python_ast::Stmt;
use ruff_python_parser::parse_suite;
use ruff_python_parser::parse_module;
#[cfg(target_os = "windows")]
#[global_allocator]
@ -60,7 +60,9 @@ fn benchmark_parser(criterion: &mut Criterion<WallTime>) {
&case,
|b, case| {
b.iter(|| {
let parsed = parse_suite(case.code()).unwrap();
let parsed = parse_module(case.code())
.expect("Input should be a valid Python code")
.into_suite();
let mut visitor = CountVisitor { count: 0 };
visitor.visit_body(&parsed);

View file

@ -22,6 +22,7 @@ ruff_python_formatter = { workspace = true }
ruff_python_parser = { workspace = true }
ruff_python_stdlib = { workspace = true }
ruff_python_trivia = { workspace = true }
ruff_text_size = { workspace = true }
ruff_workspace = { workspace = true, features = ["schemars"] }
anyhow = { workspace = true }

View file

@ -24,7 +24,7 @@ pub(crate) fn main(args: &Args) -> Result<()> {
args.file.display()
)
})?;
let python_ast = parse(source_kind.source_code(), source_type.as_mode())?;
let python_ast = parse(source_kind.source_code(), source_type.as_mode())?.into_syntax();
println!("{python_ast:#?}");
Ok(())
}

View file

@ -7,7 +7,8 @@ use anyhow::Result;
use ruff_linter::source_kind::SourceKind;
use ruff_python_ast::PySourceType;
use ruff_python_parser::{lexer, AsMode};
use ruff_python_parser::parse_unchecked_source;
use ruff_text_size::Ranged;
#[derive(clap::Args)]
pub(crate) struct Args {
@ -24,11 +25,13 @@ pub(crate) fn main(args: &Args) -> Result<()> {
args.file.display()
)
})?;
for (tok, range) in lexer::lex(source_kind.source_code(), source_type.as_mode()).flatten() {
let parsed = parse_unchecked_source(source_kind.source_code(), source_type);
for token in parsed.tokens() {
println!(
"{start:#?} {tok:#?} {end:#?}",
start = range.start(),
end = range.end()
"{start:#?} {kind:#?} {end:#?}",
start = token.start(),
end = token.end(),
kind = token.kind(),
);
}
Ok(())

View file

@ -1160,7 +1160,7 @@ pub(crate) fn expression(expr: &Expr, checker: &mut Checker) {
}
}
if checker.enabled(Rule::PrintfStringFormatting) {
pyupgrade::rules::printf_string_formatting(checker, expr, right);
pyupgrade::rules::printf_string_formatting(checker, bin_op, format_string);
}
if checker.enabled(Rule::BadStringFormatCharacter) {
pylint::rules::bad_string_format_character::percent(

View file

@ -765,7 +765,7 @@ pub(crate) fn statement(stmt: &Stmt, checker: &mut Checker) {
pyupgrade::rules::deprecated_c_element_tree(checker, stmt);
}
if checker.enabled(Rule::DeprecatedImport) {
pyupgrade::rules::deprecated_import(checker, stmt, names, module, level);
pyupgrade::rules::deprecated_import(checker, import_from);
}
if checker.enabled(Rule::UnnecessaryBuiltinImport) {
if let Some(module) = module {

View file

@ -32,8 +32,10 @@ use itertools::Itertools;
use log::debug;
use ruff_python_ast::{
self as ast, AnyParameterRef, Comprehension, ElifElseClause, ExceptHandler, Expr, ExprContext,
FStringElement, Keyword, MatchCase, Parameter, Parameters, Pattern, Stmt, Suite, UnaryOp,
FStringElement, Keyword, MatchCase, ModModule, Parameter, Parameters, Pattern, Stmt, Suite,
UnaryOp,
};
use ruff_python_parser::Parsed;
use ruff_text_size::{Ranged, TextRange, TextSize};
use ruff_diagnostics::{Diagnostic, IsolationLevel};
@ -174,6 +176,8 @@ impl ExpectedDocstringKind {
}
pub(crate) struct Checker<'a> {
/// The parsed [`Parsed`].
parsed: &'a Parsed<ModModule>,
/// The [`Path`] to the file under analysis.
path: &'a Path,
/// The [`Path`] to the package containing the current file.
@ -223,6 +227,7 @@ pub(crate) struct Checker<'a> {
impl<'a> Checker<'a> {
#[allow(clippy::too_many_arguments)]
pub(crate) fn new(
parsed: &'a Parsed<ModModule>,
settings: &'a LinterSettings,
noqa_line_for: &'a NoqaMapping,
noqa: flags::Noqa,
@ -232,12 +237,12 @@ impl<'a> Checker<'a> {
locator: &'a Locator,
stylist: &'a Stylist,
indexer: &'a Indexer,
importer: Importer<'a>,
source_type: PySourceType,
cell_offsets: Option<&'a CellOffsets>,
notebook_index: Option<&'a NotebookIndex>,
) -> Checker<'a> {
Checker {
parsed,
settings,
noqa_line_for,
noqa,
@ -248,7 +253,7 @@ impl<'a> Checker<'a> {
locator,
stylist,
indexer,
importer,
importer: Importer::new(parsed, locator, stylist),
semantic: SemanticModel::new(&settings.typing_modules, path, module),
visit: deferred::Visit::default(),
analyze: deferred::Analyze::default(),
@ -318,6 +323,11 @@ impl<'a> Checker<'a> {
}
}
/// The [`Parsed`] output for the current file, which contains the tokens, AST, and more.
pub(crate) const fn parsed(&self) -> &'a Parsed<ModModule> {
self.parsed
}
/// The [`Locator`] for the current file, which enables extraction of source code from byte
/// offsets.
pub(crate) const fn locator(&self) -> &'a Locator<'a> {
@ -2326,7 +2336,7 @@ impl<'a> Checker<'a> {
#[allow(clippy::too_many_arguments)]
pub(crate) fn check_ast(
python_ast: &Suite,
parsed: &Parsed<ModModule>,
locator: &Locator,
stylist: &Stylist,
indexer: &Indexer,
@ -2356,10 +2366,11 @@ pub(crate) fn check_ast(
} else {
ModuleSource::File(path)
},
python_ast,
python_ast: parsed.suite(),
};
let mut checker = Checker::new(
parsed,
settings,
noqa_line_for,
noqa,
@ -2369,7 +2380,6 @@ pub(crate) fn check_ast(
locator,
stylist,
indexer,
Importer::new(python_ast, locator, stylist),
source_type,
cell_offsets,
notebook_index,
@ -2377,8 +2387,8 @@ pub(crate) fn check_ast(
checker.bind_builtins();
// Iterate over the AST.
checker.visit_module(python_ast);
checker.visit_body(python_ast);
checker.visit_module(parsed.suite());
checker.visit_body(parsed.suite());
// Visit any deferred syntax nodes. Take care to visit in order, such that we avoid adding
// new deferred nodes after visiting nodes of that kind. For example, visiting a deferred

View file

@ -1,7 +1,7 @@
use std::path::Path;
use ruff_diagnostics::Diagnostic;
use ruff_python_index::Indexer;
use ruff_python_trivia::CommentRanges;
use ruff_source_file::Locator;
use crate::registry::Rule;
@ -13,7 +13,7 @@ pub(crate) fn check_file_path(
path: &Path,
package: Option<&Path>,
locator: &Locator,
indexer: &Indexer,
comment_ranges: &CommentRanges,
settings: &LinterSettings,
) -> Vec<Diagnostic> {
let mut diagnostics: Vec<Diagnostic> = vec![];
@ -24,7 +24,7 @@ pub(crate) fn check_file_path(
path,
package,
locator,
indexer,
comment_ranges,
&settings.project_root,
&settings.src,
) {

View file

@ -4,9 +4,10 @@ use std::path::Path;
use ruff_diagnostics::Diagnostic;
use ruff_notebook::CellOffsets;
use ruff_python_ast::statement_visitor::StatementVisitor;
use ruff_python_ast::{PySourceType, Suite};
use ruff_python_ast::{ModModule, PySourceType};
use ruff_python_codegen::Stylist;
use ruff_python_index::Indexer;
use ruff_python_parser::Parsed;
use ruff_source_file::Locator;
use crate::directives::IsortDirectives;
@ -17,7 +18,7 @@ use crate::settings::LinterSettings;
#[allow(clippy::too_many_arguments)]
pub(crate) fn check_imports(
python_ast: &Suite,
parsed: &Parsed<ModModule>,
locator: &Locator,
indexer: &Indexer,
directives: &IsortDirectives,
@ -31,7 +32,7 @@ pub(crate) fn check_imports(
let tracker = {
let mut tracker =
BlockBuilder::new(locator, directives, source_type.is_stub(), cell_offsets);
tracker.visit_body(python_ast);
tracker.visit_body(parsed.suite());
tracker
};
@ -50,6 +51,7 @@ pub(crate) fn check_imports(
settings,
package,
source_type,
parsed,
) {
diagnostics.push(diagnostic);
}
@ -58,7 +60,7 @@ pub(crate) fn check_imports(
}
if settings.rules.enabled(Rule::MissingRequiredImport) {
diagnostics.extend(isort::rules::add_required_imports(
python_ast,
parsed,
locator,
stylist,
settings,

View file

@ -2,8 +2,7 @@ use crate::line_width::IndentWidth;
use ruff_diagnostics::Diagnostic;
use ruff_python_codegen::Stylist;
use ruff_python_index::Indexer;
use ruff_python_parser::lexer::LexResult;
use ruff_python_parser::TokenKind;
use ruff_python_parser::{TokenKind, Tokens};
use ruff_source_file::Locator;
use ruff_text_size::{Ranged, TextRange};
@ -34,7 +33,7 @@ pub(crate) fn expand_indent(line: &str, indent_width: IndentWidth) -> usize {
}
pub(crate) fn check_logical_lines(
tokens: &[LexResult],
tokens: &Tokens,
locator: &Locator,
indexer: &Indexer,
stylist: &Stylist,

View file

@ -3,6 +3,7 @@
use ruff_diagnostics::Diagnostic;
use ruff_python_codegen::Stylist;
use ruff_python_index::Indexer;
use ruff_python_trivia::CommentRanges;
use ruff_source_file::{Locator, UniversalNewlines};
use ruff_text_size::TextSize;
@ -19,6 +20,7 @@ pub(crate) fn check_physical_lines(
locator: &Locator,
stylist: &Stylist,
indexer: &Indexer,
comment_ranges: &CommentRanges,
doc_lines: &[TextSize],
settings: &LinterSettings,
) -> Vec<Diagnostic> {
@ -42,7 +44,7 @@ pub(crate) fn check_physical_lines(
.is_some()
{
if enforce_doc_line_too_long {
if let Some(diagnostic) = doc_line_too_long(&line, indexer, settings) {
if let Some(diagnostic) = doc_line_too_long(&line, comment_ranges, settings) {
diagnostics.push(diagnostic);
}
}
@ -55,7 +57,7 @@ pub(crate) fn check_physical_lines(
}
if enforce_line_too_long {
if let Some(diagnostic) = line_too_long(&line, indexer, settings) {
if let Some(diagnostic) = line_too_long(&line, comment_ranges, settings) {
diagnostics.push(diagnostic);
}
}
@ -90,8 +92,7 @@ pub(crate) fn check_physical_lines(
mod tests {
use ruff_python_codegen::Stylist;
use ruff_python_index::Indexer;
use ruff_python_parser::lexer::lex;
use ruff_python_parser::Mode;
use ruff_python_parser::parse_module;
use ruff_source_file::Locator;
use crate::line_width::LineLength;
@ -105,15 +106,16 @@ mod tests {
fn e501_non_ascii_char() {
let line = "'\u{4e9c}' * 2"; // 7 in UTF-32, 9 in UTF-8.
let locator = Locator::new(line);
let tokens: Vec<_> = lex(line, Mode::Module).collect();
let indexer = Indexer::from_tokens(&tokens, &locator);
let stylist = Stylist::from_tokens(&tokens, &locator);
let parsed = parse_module(line).unwrap();
let indexer = Indexer::from_tokens(parsed.tokens(), &locator);
let stylist = Stylist::from_tokens(parsed.tokens(), &locator);
let check_with_max_line_length = |line_length: LineLength| {
check_physical_lines(
&locator,
&stylist,
&indexer,
parsed.comment_ranges(),
&[],
&LinterSettings {
pycodestyle: pycodestyle::settings::Settings {

View file

@ -3,15 +3,16 @@
use std::path::Path;
use ruff_notebook::CellOffsets;
use ruff_python_ast::PySourceType;
use ruff_python_ast::{ModModule, PySourceType};
use ruff_python_codegen::Stylist;
use ruff_diagnostics::Diagnostic;
use ruff_python_index::Indexer;
use ruff_python_parser::Parsed;
use ruff_source_file::Locator;
use ruff_text_size::Ranged;
use crate::directives::TodoComment;
use crate::linter::TokenSource;
use crate::registry::{AsRule, Rule};
use crate::rules::pycodestyle::rules::BlankLinesChecker;
use crate::rules::{
@ -22,7 +23,7 @@ use crate::settings::LinterSettings;
#[allow(clippy::too_many_arguments)]
pub(crate) fn check_tokens(
tokens: &TokenSource,
parsed: &Parsed<ModModule>,
path: &Path,
locator: &Locator,
indexer: &Indexer,
@ -33,6 +34,9 @@ pub(crate) fn check_tokens(
) -> Vec<Diagnostic> {
let mut diagnostics: Vec<Diagnostic> = vec![];
let tokens = parsed.tokens();
let comment_ranges = parsed.comment_ranges();
if settings.rules.any_enabled(&[
Rule::BlankLineBetweenMethods,
Rule::BlankLinesTopLevel,
@ -42,22 +46,22 @@ pub(crate) fn check_tokens(
Rule::BlankLinesBeforeNestedDefinition,
]) {
BlankLinesChecker::new(locator, stylist, settings, source_type, cell_offsets)
.check_lines(tokens.kinds(), &mut diagnostics);
.check_lines(tokens, &mut diagnostics);
}
if settings.rules.enabled(Rule::BlanketTypeIgnore) {
pygrep_hooks::rules::blanket_type_ignore(&mut diagnostics, indexer, locator);
pygrep_hooks::rules::blanket_type_ignore(&mut diagnostics, comment_ranges, locator);
}
if settings.rules.enabled(Rule::EmptyComment) {
pylint::rules::empty_comments(&mut diagnostics, indexer, locator);
pylint::rules::empty_comments(&mut diagnostics, comment_ranges, locator);
}
if settings
.rules
.enabled(Rule::AmbiguousUnicodeCharacterComment)
{
for range in indexer.comment_ranges() {
for range in comment_ranges {
ruff::rules::ambiguous_unicode_character_comment(
&mut diagnostics,
locator,
@ -68,11 +72,16 @@ pub(crate) fn check_tokens(
}
if settings.rules.enabled(Rule::CommentedOutCode) {
eradicate::rules::commented_out_code(&mut diagnostics, locator, indexer, settings);
eradicate::rules::commented_out_code(&mut diagnostics, locator, comment_ranges, settings);
}
if settings.rules.enabled(Rule::UTF8EncodingDeclaration) {
pyupgrade::rules::unnecessary_coding_comment(&mut diagnostics, locator, indexer);
pyupgrade::rules::unnecessary_coding_comment(
&mut diagnostics,
locator,
indexer,
comment_ranges,
);
}
if settings.rules.enabled(Rule::TabIndentation) {
@ -86,8 +95,13 @@ pub(crate) fn check_tokens(
Rule::InvalidCharacterNul,
Rule::InvalidCharacterZeroWidthSpace,
]) {
for (token, range) in tokens.kinds() {
pylint::rules::invalid_string_characters(&mut diagnostics, token, range, locator);
for token in tokens.up_to_first_unknown() {
pylint::rules::invalid_string_characters(
&mut diagnostics,
token.kind(),
token.range(),
locator,
);
}
}
@ -98,7 +112,7 @@ pub(crate) fn check_tokens(
]) {
pycodestyle::rules::compound_statements(
&mut diagnostics,
tokens.kinds(),
tokens,
locator,
indexer,
source_type,
@ -112,7 +126,7 @@ pub(crate) fn check_tokens(
]) {
flake8_implicit_str_concat::rules::implicit(
&mut diagnostics,
tokens.kinds(),
tokens,
settings,
locator,
indexer,
@ -124,15 +138,15 @@ pub(crate) fn check_tokens(
Rule::TrailingCommaOnBareTuple,
Rule::ProhibitedTrailingComma,
]) {
flake8_commas::rules::trailing_commas(&mut diagnostics, tokens.kinds(), locator, indexer);
flake8_commas::rules::trailing_commas(&mut diagnostics, tokens, locator, indexer);
}
if settings.rules.enabled(Rule::ExtraneousParentheses) {
pyupgrade::rules::extraneous_parentheses(&mut diagnostics, tokens.kinds(), locator);
pyupgrade::rules::extraneous_parentheses(&mut diagnostics, tokens, locator);
}
if source_type.is_stub() && settings.rules.enabled(Rule::TypeCommentInStub) {
flake8_pyi::rules::type_comment_in_stub(&mut diagnostics, locator, indexer);
flake8_pyi::rules::type_comment_in_stub(&mut diagnostics, locator, comment_ranges);
}
if settings.rules.any_enabled(&[
@ -142,7 +156,7 @@ pub(crate) fn check_tokens(
Rule::ShebangNotFirstLine,
Rule::ShebangMissingPython,
]) {
flake8_executable::rules::from_tokens(&mut diagnostics, path, locator, indexer);
flake8_executable::rules::from_tokens(&mut diagnostics, path, locator, comment_ranges);
}
if settings.rules.any_enabled(&[
@ -158,8 +172,7 @@ pub(crate) fn check_tokens(
Rule::LineContainsTodo,
Rule::LineContainsHack,
]) {
let todo_comments: Vec<TodoComment> = indexer
.comment_ranges()
let todo_comments: Vec<TodoComment> = comment_ranges
.iter()
.enumerate()
.filter_map(|(i, comment_range)| {
@ -167,12 +180,12 @@ pub(crate) fn check_tokens(
TodoComment::from_comment(comment, *comment_range, i)
})
.collect();
flake8_todos::rules::todos(&mut diagnostics, &todo_comments, locator, indexer);
flake8_todos::rules::todos(&mut diagnostics, &todo_comments, locator, comment_ranges);
flake8_fixme::rules::todos(&mut diagnostics, &todo_comments);
}
if settings.rules.enabled(Rule::TooManyNewlinesAtEndOfFile) {
pycodestyle::rules::too_many_newlines_at_end_of_file(&mut diagnostics, tokens.kinds());
pycodestyle::rules::too_many_newlines_at_end_of_file(&mut diagnostics, tokens);
}
diagnostics.retain(|diagnostic| settings.rules.enabled(diagnostic.kind.rule()));

View file

@ -4,9 +4,9 @@ use std::iter::Peekable;
use std::str::FromStr;
use bitflags::bitflags;
use ruff_python_ast::StringFlags;
use ruff_python_parser::lexer::LexResult;
use ruff_python_parser::Tok;
use ruff_python_ast::ModModule;
use ruff_python_parser::{Parsed, TokenKind, Tokens};
use ruff_python_trivia::CommentRanges;
use ruff_text_size::{Ranged, TextLen, TextRange, TextSize};
use ruff_python_index::Indexer;
@ -52,19 +52,19 @@ pub struct Directives {
}
pub fn extract_directives(
lxr: &[LexResult],
parsed: &Parsed<ModModule>,
flags: Flags,
locator: &Locator,
indexer: &Indexer,
) -> Directives {
Directives {
noqa_line_for: if flags.intersects(Flags::NOQA) {
extract_noqa_line_for(lxr, locator, indexer)
extract_noqa_line_for(parsed.tokens(), locator, indexer)
} else {
NoqaMapping::default()
},
isort: if flags.intersects(Flags::ISORT) {
extract_isort_directives(locator, indexer)
extract_isort_directives(locator, parsed.comment_ranges())
} else {
IsortDirectives::default()
},
@ -105,22 +105,22 @@ where
}
/// Extract a mapping from logical line to noqa line.
fn extract_noqa_line_for(lxr: &[LexResult], locator: &Locator, indexer: &Indexer) -> NoqaMapping {
fn extract_noqa_line_for(tokens: &Tokens, locator: &Locator, indexer: &Indexer) -> NoqaMapping {
let mut string_mappings = Vec::new();
for (tok, range) in lxr.iter().flatten() {
match tok {
Tok::EndOfFile => {
for token in tokens.up_to_first_unknown() {
match token.kind() {
TokenKind::EndOfFile => {
break;
}
// For multi-line strings, we expect `noqa` directives on the last line of the
// string.
Tok::String { flags, .. } if flags.is_triple_quoted() => {
if locator.contains_line_break(*range) {
TokenKind::String if token.is_triple_quoted_string() => {
if locator.contains_line_break(token.range()) {
string_mappings.push(TextRange::new(
locator.line_start(range.start()),
range.end(),
locator.line_start(token.start()),
token.end(),
));
}
}
@ -197,12 +197,12 @@ fn extract_noqa_line_for(lxr: &[LexResult], locator: &Locator, indexer: &Indexer
}
/// Extract a set of ranges over which to disable isort.
fn extract_isort_directives(locator: &Locator, indexer: &Indexer) -> IsortDirectives {
fn extract_isort_directives(locator: &Locator, comment_ranges: &CommentRanges) -> IsortDirectives {
let mut exclusions: Vec<TextRange> = Vec::default();
let mut splits: Vec<TextSize> = Vec::default();
let mut off: Option<TextSize> = None;
for range in indexer.comment_ranges() {
for range in comment_ranges {
let comment_text = locator.slice(range);
// `isort` allows for `# isort: skip` and `# isort: skip_file` to include or
@ -379,8 +379,7 @@ impl TodoDirectiveKind {
#[cfg(test)]
mod tests {
use ruff_python_parser::lexer::LexResult;
use ruff_python_parser::{lexer, Mode};
use ruff_python_parser::parse_module;
use ruff_text_size::{TextLen, TextRange, TextSize};
use ruff_python_index::Indexer;
@ -391,12 +390,14 @@ mod tests {
};
use crate::noqa::NoqaMapping;
fn noqa_mappings(contents: &str) -> NoqaMapping {
let lxr: Vec<LexResult> = lexer::lex(contents, Mode::Module).collect();
let locator = Locator::new(contents);
let indexer = Indexer::from_tokens(&lxr, &locator);
use super::IsortDirectives;
extract_noqa_line_for(&lxr, &locator, &indexer)
fn noqa_mappings(contents: &str) -> NoqaMapping {
let parsed = parse_module(contents).unwrap();
let locator = Locator::new(contents);
let indexer = Indexer::from_tokens(parsed.tokens(), &locator);
extract_noqa_line_for(parsed.tokens(), &locator, &indexer)
}
#[test]
@ -566,29 +567,26 @@ assert foo, \
);
}
fn isort_directives(contents: &str) -> IsortDirectives {
let parsed = parse_module(contents).unwrap();
let locator = Locator::new(contents);
extract_isort_directives(&locator, parsed.comment_ranges())
}
#[test]
fn isort_exclusions() {
let contents = "x = 1
y = 2
z = x + 1";
let lxr: Vec<LexResult> = lexer::lex(contents, Mode::Module).collect();
let locator = Locator::new(contents);
let indexer = Indexer::from_tokens(&lxr, &locator);
assert_eq!(
extract_isort_directives(&locator, &indexer).exclusions,
Vec::default()
);
assert_eq!(isort_directives(contents).exclusions, Vec::default());
let contents = "# isort: off
x = 1
y = 2
# isort: on
z = x + 1";
let lxr: Vec<LexResult> = lexer::lex(contents, Mode::Module).collect();
let locator = Locator::new(contents);
let indexer = Indexer::from_tokens(&lxr, &locator);
assert_eq!(
extract_isort_directives(&locator, &indexer).exclusions,
isort_directives(contents).exclusions,
Vec::from_iter([TextRange::new(TextSize::from(0), TextSize::from(25))])
);
@ -599,11 +597,8 @@ y = 2
# isort: on
z = x + 1
# isort: on";
let lxr: Vec<LexResult> = lexer::lex(contents, Mode::Module).collect();
let locator = Locator::new(contents);
let indexer = Indexer::from_tokens(&lxr, &locator);
assert_eq!(
extract_isort_directives(&locator, &indexer).exclusions,
isort_directives(contents).exclusions,
Vec::from_iter([TextRange::new(TextSize::from(0), TextSize::from(38))])
);
@ -611,11 +606,8 @@ z = x + 1
x = 1
y = 2
z = x + 1";
let lxr: Vec<LexResult> = lexer::lex(contents, Mode::Module).collect();
let locator = Locator::new(contents);
let indexer = Indexer::from_tokens(&lxr, &locator);
assert_eq!(
extract_isort_directives(&locator, &indexer).exclusions,
isort_directives(contents).exclusions,
Vec::from_iter([TextRange::at(TextSize::from(0), contents.text_len())])
);
@ -623,13 +615,7 @@ z = x + 1";
x = 1
y = 2
z = x + 1";
let lxr: Vec<LexResult> = lexer::lex(contents, Mode::Module).collect();
let locator = Locator::new(contents);
let indexer = Indexer::from_tokens(&lxr, &locator);
assert_eq!(
extract_isort_directives(&locator, &indexer).exclusions,
Vec::default()
);
assert_eq!(isort_directives(contents).exclusions, Vec::default());
let contents = "# isort: off
x = 1
@ -637,13 +623,7 @@ x = 1
y = 2
# isort: skip_file
z = x + 1";
let lxr: Vec<LexResult> = lexer::lex(contents, Mode::Module).collect();
let locator = Locator::new(contents);
let indexer = Indexer::from_tokens(&lxr, &locator);
assert_eq!(
extract_isort_directives(&locator, &indexer).exclusions,
Vec::default()
);
assert_eq!(isort_directives(contents).exclusions, Vec::default());
}
#[test]
@ -651,36 +631,18 @@ z = x + 1";
let contents = "x = 1
y = 2
z = x + 1";
let lxr: Vec<LexResult> = lexer::lex(contents, Mode::Module).collect();
let locator = Locator::new(contents);
let indexer = Indexer::from_tokens(&lxr, &locator);
assert_eq!(
extract_isort_directives(&locator, &indexer).splits,
Vec::new()
);
assert_eq!(isort_directives(contents).splits, Vec::new());
let contents = "x = 1
y = 2
# isort: split
z = x + 1";
let lxr: Vec<LexResult> = lexer::lex(contents, Mode::Module).collect();
let locator = Locator::new(contents);
let indexer = Indexer::from_tokens(&lxr, &locator);
assert_eq!(
extract_isort_directives(&locator, &indexer).splits,
vec![TextSize::from(12)]
);
assert_eq!(isort_directives(contents).splits, vec![TextSize::from(12)]);
let contents = "x = 1
y = 2 # isort: split
z = x + 1";
let lxr: Vec<LexResult> = lexer::lex(contents, Mode::Module).collect();
let locator = Locator::new(contents);
let indexer = Indexer::from_tokens(&lxr, &locator);
assert_eq!(
extract_isort_directives(&locator, &indexer).splits,
vec![TextSize::from(13)]
);
assert_eq!(isort_directives(contents).splits, vec![TextSize::from(13)]);
}
#[test]

View file

@ -2,28 +2,29 @@
//! standalone comment or a constant string statement.
use std::iter::FusedIterator;
use std::slice::Iter;
use ruff_python_ast::{self as ast, Stmt, Suite};
use ruff_python_parser::{TokenKind, TokenKindIter};
use ruff_python_parser::{Token, TokenKind, Tokens};
use ruff_text_size::{Ranged, TextSize};
use ruff_python_ast::statement_visitor::{walk_stmt, StatementVisitor};
use ruff_source_file::{Locator, UniversalNewlineIterator};
/// Extract doc lines (standalone comments) from a token sequence.
pub(crate) fn doc_lines_from_tokens(tokens: TokenKindIter) -> DocLines {
pub(crate) fn doc_lines_from_tokens(tokens: &Tokens) -> DocLines {
DocLines::new(tokens)
}
pub(crate) struct DocLines<'a> {
inner: TokenKindIter<'a>,
inner: Iter<'a, Token>,
prev: TextSize,
}
impl<'a> DocLines<'a> {
fn new(tokens: TokenKindIter<'a>) -> Self {
fn new(tokens: &'a Tokens) -> Self {
Self {
inner: tokens,
inner: tokens.up_to_first_unknown().iter(),
prev: TextSize::default(),
}
}
@ -35,12 +36,12 @@ impl Iterator for DocLines<'_> {
fn next(&mut self) -> Option<Self::Item> {
let mut at_start_of_line = true;
loop {
let (tok, range) = self.inner.next()?;
let token = self.inner.next()?;
match tok {
match token.kind() {
TokenKind::Comment => {
if at_start_of_line {
break Some(range.start());
break Some(token.start());
}
}
TokenKind::Newline | TokenKind::NonLogicalNewline => {
@ -54,7 +55,7 @@ impl Iterator for DocLines<'_> {
}
}
self.prev = range.end();
self.prev = token.end();
}
}
}

View file

@ -531,8 +531,9 @@ mod tests {
use test_case::test_case;
use ruff_diagnostics::{Diagnostic, Edit, Fix};
use ruff_python_ast::Stmt;
use ruff_python_codegen::Stylist;
use ruff_python_parser::{lexer, parse_expression, parse_suite, Mode};
use ruff_python_parser::{parse_expression, parse_module};
use ruff_source_file::Locator;
use ruff_text_size::{Ranged, TextRange, TextSize};
@ -541,17 +542,21 @@ mod tests {
add_to_dunder_all, make_redundant_alias, next_stmt_break, trailing_semicolon,
};
/// Parse the given source using [`Mode::Module`] and return the first statement.
fn parse_first_stmt(source: &str) -> Result<Stmt> {
let suite = parse_module(source)?.into_suite();
Ok(suite.into_iter().next().unwrap())
}
#[test]
fn find_semicolon() -> Result<()> {
let contents = "x = 1";
let program = parse_suite(contents)?;
let stmt = program.first().unwrap();
let stmt = parse_first_stmt(contents)?;
let locator = Locator::new(contents);
assert_eq!(trailing_semicolon(stmt.end(), &locator), None);
let contents = "x = 1; y = 1";
let program = parse_suite(contents)?;
let stmt = program.first().unwrap();
let stmt = parse_first_stmt(contents)?;
let locator = Locator::new(contents);
assert_eq!(
trailing_semicolon(stmt.end(), &locator),
@ -559,8 +564,7 @@ mod tests {
);
let contents = "x = 1 ; y = 1";
let program = parse_suite(contents)?;
let stmt = program.first().unwrap();
let stmt = parse_first_stmt(contents)?;
let locator = Locator::new(contents);
assert_eq!(
trailing_semicolon(stmt.end(), &locator),
@ -572,8 +576,7 @@ x = 1 \
; y = 1
"
.trim();
let program = parse_suite(contents)?;
let stmt = program.first().unwrap();
let stmt = parse_first_stmt(contents)?;
let locator = Locator::new(contents);
assert_eq!(
trailing_semicolon(stmt.end(), &locator),
@ -612,12 +615,11 @@ x = 1 \
}
#[test]
fn redundant_alias() {
fn redundant_alias() -> Result<()> {
let contents = "import x, y as y, z as bees";
let program = parse_suite(contents).unwrap();
let stmt = program.first().unwrap();
let stmt = parse_first_stmt(contents)?;
assert_eq!(
make_redundant_alias(["x"].into_iter().map(Cow::from), stmt),
make_redundant_alias(["x"].into_iter().map(Cow::from), &stmt),
vec![Edit::range_replacement(
String::from("x as x"),
TextRange::new(TextSize::new(7), TextSize::new(8)),
@ -625,7 +627,7 @@ x = 1 \
"make just one item redundant"
);
assert_eq!(
make_redundant_alias(vec!["x", "y"].into_iter().map(Cow::from), stmt),
make_redundant_alias(vec!["x", "y"].into_iter().map(Cow::from), &stmt),
vec![Edit::range_replacement(
String::from("x as x"),
TextRange::new(TextSize::new(7), TextSize::new(8)),
@ -633,13 +635,14 @@ x = 1 \
"the second item is already a redundant alias"
);
assert_eq!(
make_redundant_alias(vec!["x", "z"].into_iter().map(Cow::from), stmt),
make_redundant_alias(vec!["x", "z"].into_iter().map(Cow::from), &stmt),
vec![Edit::range_replacement(
String::from("x as x"),
TextRange::new(TextSize::new(7), TextSize::new(8)),
)],
"the third item is already aliased to something else"
);
Ok(())
}
#[test_case("()", &["x", "y"], r#"("x", "y")"# ; "2 into empty tuple")]
@ -661,13 +664,9 @@ x = 1 \
fn add_to_dunder_all_test(raw: &str, names: &[&str], expect: &str) -> Result<()> {
let locator = Locator::new(raw);
let edits = {
let expr = parse_expression(raw)?;
let stylist = Stylist::from_tokens(
&lexer::lex(raw, Mode::Expression).collect::<Vec<_>>(),
&locator,
);
// SUT
add_to_dunder_all(names.iter().copied(), &expr, &stylist)
let parsed = parse_expression(raw)?;
let stylist = Stylist::from_tokens(parsed.tokens(), &locator);
add_to_dunder_all(names.iter().copied(), parsed.expr(), &stylist)
};
let diag = {
use crate::rules::pycodestyle::rules::MissingNewlineAtEndOfFile;

View file

@ -1,8 +1,8 @@
//! Insert statements into Python code.
use std::ops::Add;
use ruff_python_ast::{PySourceType, Stmt};
use ruff_python_parser::{lexer, AsMode, Tok};
use ruff_python_ast::Stmt;
use ruff_python_parser::{TokenKind, Tokens};
use ruff_text_size::{Ranged, TextSize};
use ruff_diagnostics::Edit;
@ -145,7 +145,7 @@ impl<'a> Insertion<'a> {
mut location: TextSize,
locator: &Locator<'a>,
stylist: &Stylist,
source_type: PySourceType,
tokens: &Tokens,
) -> Insertion<'a> {
enum Awaiting {
Colon(u32),
@ -154,40 +154,38 @@ impl<'a> Insertion<'a> {
}
let mut state = Awaiting::Colon(0);
for (tok, range) in
lexer::lex_starts_at(locator.after(location), source_type.as_mode(), location).flatten()
{
for token in tokens.after(location) {
match state {
// Iterate until we find the colon indicating the start of the block body.
Awaiting::Colon(depth) => match tok {
Tok::Colon if depth == 0 => {
Awaiting::Colon(depth) => match token.kind() {
TokenKind::Colon if depth == 0 => {
state = Awaiting::Newline;
}
Tok::Lpar | Tok::Lbrace | Tok::Lsqb => {
TokenKind::Lpar | TokenKind::Lbrace | TokenKind::Lsqb => {
state = Awaiting::Colon(depth.saturating_add(1));
}
Tok::Rpar | Tok::Rbrace | Tok::Rsqb => {
TokenKind::Rpar | TokenKind::Rbrace | TokenKind::Rsqb => {
state = Awaiting::Colon(depth.saturating_sub(1));
}
_ => {}
},
// Once we've seen the colon, we're looking for a newline; otherwise, there's no
// block body (e.g. `if True: pass`).
Awaiting::Newline => match tok {
Tok::Comment(..) => {}
Tok::Newline => {
Awaiting::Newline => match token.kind() {
TokenKind::Comment => {}
TokenKind::Newline => {
state = Awaiting::Indent;
}
_ => {
location = range.start();
location = token.start();
break;
}
},
// Once we've seen the newline, we're looking for the indentation of the block body.
Awaiting::Indent => match tok {
Tok::Comment(..) => {}
Tok::NonLogicalNewline => {}
Tok::Indent => {
Awaiting::Indent => match token.kind() {
TokenKind::Comment => {}
TokenKind::NonLogicalNewline => {}
TokenKind::Indent => {
// This is like:
// ```python
// if True:
@ -196,13 +194,13 @@ impl<'a> Insertion<'a> {
// Where `range` is the indentation before the `pass` token.
return Insertion::indented(
"",
range.start(),
token.start(),
stylist.line_ending().as_str(),
locator.slice(range),
locator.slice(token),
);
}
_ => {
location = range.start();
location = token.start();
break;
}
},
@ -319,9 +317,8 @@ fn match_continuation(s: &str) -> Option<TextSize> {
mod tests {
use anyhow::Result;
use ruff_python_ast::PySourceType;
use ruff_python_codegen::Stylist;
use ruff_python_parser::{parse_suite, Mode};
use ruff_python_parser::parse_module;
use ruff_source_file::{LineEnding, Locator};
use ruff_text_size::TextSize;
@ -330,11 +327,10 @@ mod tests {
#[test]
fn start_of_file() -> Result<()> {
fn insert(contents: &str) -> Result<Insertion> {
let program = parse_suite(contents)?;
let tokens = ruff_python_parser::tokenize(contents, Mode::Module);
let parsed = parse_module(contents)?;
let locator = Locator::new(contents);
let stylist = Stylist::from_tokens(&tokens, &locator);
Ok(Insertion::start_of_file(&program, &locator, &stylist))
let stylist = Stylist::from_tokens(parsed.tokens(), &locator);
Ok(Insertion::start_of_file(parsed.suite(), &locator, &stylist))
}
let contents = "";
@ -442,10 +438,10 @@ x = 1
#[test]
fn start_of_block() {
fn insert(contents: &str, offset: TextSize) -> Insertion {
let tokens = ruff_python_parser::tokenize(contents, Mode::Module);
let parsed = parse_module(contents).unwrap();
let locator = Locator::new(contents);
let stylist = Stylist::from_tokens(&tokens, &locator);
Insertion::start_of_block(offset, &locator, &stylist, PySourceType::default())
let stylist = Stylist::from_tokens(parsed.tokens(), &locator);
Insertion::start_of_block(offset, &locator, &stylist, parsed.tokens())
}
let contents = "if True: pass";

View file

@ -7,7 +7,8 @@ use std::error::Error;
use anyhow::Result;
use libcst_native::{ImportAlias, Name, NameOrAttribute};
use ruff_python_ast::{self as ast, PySourceType, Stmt};
use ruff_python_ast::{self as ast, ModModule, Stmt};
use ruff_python_parser::{Parsed, Tokens};
use ruff_text_size::{Ranged, TextSize};
use ruff_diagnostics::Edit;
@ -27,6 +28,8 @@ mod insertion;
pub(crate) struct Importer<'a> {
/// The Python AST to which we are adding imports.
python_ast: &'a [Stmt],
/// The tokens representing the Python AST.
tokens: &'a Tokens,
/// The [`Locator`] for the Python AST.
locator: &'a Locator<'a>,
/// The [`Stylist`] for the Python AST.
@ -39,12 +42,13 @@ pub(crate) struct Importer<'a> {
impl<'a> Importer<'a> {
pub(crate) fn new(
python_ast: &'a [Stmt],
parsed: &'a Parsed<ModModule>,
locator: &'a Locator<'a>,
stylist: &'a Stylist<'a>,
) -> Self {
Self {
python_ast,
python_ast: parsed.suite(),
tokens: parsed.tokens(),
locator,
stylist,
runtime_imports: Vec::default(),
@ -121,7 +125,6 @@ impl<'a> Importer<'a> {
import: &ImportedMembers,
at: TextSize,
semantic: &SemanticModel,
source_type: PySourceType,
) -> Result<TypingImportEdit> {
// Generate the modified import statement.
let content = fix::codemods::retain_imports(
@ -178,7 +181,7 @@ impl<'a> Importer<'a> {
// Add the import to a `TYPE_CHECKING` block.
let add_import_edit = if let Some(block) = self.preceding_type_checking_block(at) {
// Add the import to the `TYPE_CHECKING` block.
self.add_to_type_checking_block(&content, block.start(), source_type)
self.add_to_type_checking_block(&content, block.start())
} else {
// Add the import to a new `TYPE_CHECKING` block.
self.add_type_checking_block(
@ -455,13 +458,8 @@ impl<'a> Importer<'a> {
}
/// Add an import statement to an existing `TYPE_CHECKING` block.
fn add_to_type_checking_block(
&self,
content: &str,
at: TextSize,
source_type: PySourceType,
) -> Edit {
Insertion::start_of_block(at, self.locator, self.stylist, source_type).into_edit(content)
fn add_to_type_checking_block(&self, content: &str, at: TextSize) -> Edit {
Insertion::start_of_block(at, self.locator, self.stylist, self.tokens).into_edit(content)
}
/// Return the import statement that precedes the given position, if any.

View file

@ -10,11 +10,10 @@ use rustc_hash::FxHashMap;
use ruff_diagnostics::Diagnostic;
use ruff_notebook::Notebook;
use ruff_python_ast::{PySourceType, Suite};
use ruff_python_ast::{ModModule, PySourceType};
use ruff_python_codegen::Stylist;
use ruff_python_index::Indexer;
use ruff_python_parser::lexer::LexResult;
use ruff_python_parser::{AsMode, ParseError, TokenKindIter, Tokens};
use ruff_python_parser::{ParseError, Parsed};
use ruff_source_file::{Locator, SourceFileBuilder};
use ruff_text_size::Ranged;
@ -82,18 +81,21 @@ pub fn check_path(
noqa: flags::Noqa,
source_kind: &SourceKind,
source_type: PySourceType,
tokens: TokenSource,
parsed: &Parsed<ModModule>,
) -> LinterResult<Vec<Diagnostic>> {
// Aggregate all diagnostics.
let mut diagnostics = vec![];
let mut error = None;
let tokens = parsed.tokens();
let comment_ranges = parsed.comment_ranges();
// Collect doc lines. This requires a rare mix of tokens (for comments) and AST
// (for docstrings), which demands special-casing at this level.
let use_doc_lines = settings.rules.enabled(Rule::DocLineTooLong);
let mut doc_lines = vec![];
if use_doc_lines {
doc_lines.extend(doc_lines_from_tokens(tokens.kinds()));
doc_lines.extend(doc_lines_from_tokens(tokens));
}
// Run the token-based rules.
@ -103,7 +105,7 @@ pub fn check_path(
.any(|rule_code| rule_code.lint_source().is_tokens())
{
diagnostics.extend(check_tokens(
&tokens,
parsed,
path,
locator,
indexer,
@ -120,7 +122,13 @@ pub fn check_path(
.iter_enabled()
.any(|rule_code| rule_code.lint_source().is_filesystem())
{
diagnostics.extend(check_file_path(path, package, locator, indexer, settings));
diagnostics.extend(check_file_path(
path,
package,
locator,
comment_ranges,
settings,
));
}
// Run the logical line-based rules.
@ -130,7 +138,7 @@ pub fn check_path(
.any(|rule_code| rule_code.lint_source().is_logical_lines())
{
diagnostics.extend(crate::checkers::logical_lines::check_logical_lines(
&tokens, locator, indexer, stylist, settings,
tokens, locator, indexer, stylist, settings,
));
}
@ -145,14 +153,13 @@ pub fn check_path(
.iter_enabled()
.any(|rule_code| rule_code.lint_source().is_imports());
if use_ast || use_imports || use_doc_lines {
// Parse, if the AST wasn't pre-provided provided.
match tokens.into_ast(source_kind, source_type) {
Ok(python_ast) => {
match parsed.as_result() {
Ok(parsed) => {
let cell_offsets = source_kind.as_ipy_notebook().map(Notebook::cell_offsets);
let notebook_index = source_kind.as_ipy_notebook().map(Notebook::index);
if use_ast {
diagnostics.extend(check_ast(
&python_ast,
parsed,
locator,
stylist,
indexer,
@ -168,7 +175,7 @@ pub fn check_path(
}
if use_imports {
let import_diagnostics = check_imports(
&python_ast,
parsed,
locator,
indexer,
&directives.isort,
@ -182,7 +189,7 @@ pub fn check_path(
diagnostics.extend(import_diagnostics);
}
if use_doc_lines {
doc_lines.extend(doc_lines_from_ast(&python_ast, locator));
doc_lines.extend(doc_lines_from_ast(parsed.suite(), locator));
}
}
Err(parse_error) => {
@ -191,8 +198,9 @@ pub fn check_path(
// if it's disabled via any of the usual mechanisms (e.g., `noqa`,
// `per-file-ignores`), and the easiest way to detect that suppression is
// to see if the diagnostic persists to the end of the function.
pycodestyle::rules::syntax_error(&mut diagnostics, &parse_error, locator);
error = Some(parse_error);
pycodestyle::rules::syntax_error(&mut diagnostics, parse_error, locator);
// TODO(dhruvmanila): Remove this clone
error = Some(parse_error.clone());
}
}
}
@ -210,7 +218,12 @@ pub fn check_path(
.any(|rule_code| rule_code.lint_source().is_physical_lines())
{
diagnostics.extend(check_physical_lines(
locator, stylist, indexer, &doc_lines, settings,
locator,
stylist,
indexer,
comment_ranges,
&doc_lines,
settings,
));
}
@ -222,36 +235,44 @@ pub fn check_path(
continue;
}
let diagnostic = match test_rule {
Rule::StableTestRule => test_rules::StableTestRule::diagnostic(locator, indexer),
Rule::StableTestRule => {
test_rules::StableTestRule::diagnostic(locator, comment_ranges)
}
Rule::StableTestRuleSafeFix => {
test_rules::StableTestRuleSafeFix::diagnostic(locator, indexer)
test_rules::StableTestRuleSafeFix::diagnostic(locator, comment_ranges)
}
Rule::StableTestRuleUnsafeFix => {
test_rules::StableTestRuleUnsafeFix::diagnostic(locator, indexer)
test_rules::StableTestRuleUnsafeFix::diagnostic(locator, comment_ranges)
}
Rule::StableTestRuleDisplayOnlyFix => {
test_rules::StableTestRuleDisplayOnlyFix::diagnostic(locator, indexer)
test_rules::StableTestRuleDisplayOnlyFix::diagnostic(locator, comment_ranges)
}
Rule::NurseryTestRule => {
test_rules::NurseryTestRule::diagnostic(locator, comment_ranges)
}
Rule::PreviewTestRule => {
test_rules::PreviewTestRule::diagnostic(locator, comment_ranges)
}
Rule::NurseryTestRule => test_rules::NurseryTestRule::diagnostic(locator, indexer),
Rule::PreviewTestRule => test_rules::PreviewTestRule::diagnostic(locator, indexer),
Rule::DeprecatedTestRule => {
test_rules::DeprecatedTestRule::diagnostic(locator, indexer)
test_rules::DeprecatedTestRule::diagnostic(locator, comment_ranges)
}
Rule::AnotherDeprecatedTestRule => {
test_rules::AnotherDeprecatedTestRule::diagnostic(locator, indexer)
test_rules::AnotherDeprecatedTestRule::diagnostic(locator, comment_ranges)
}
Rule::RemovedTestRule => {
test_rules::RemovedTestRule::diagnostic(locator, comment_ranges)
}
Rule::RemovedTestRule => test_rules::RemovedTestRule::diagnostic(locator, indexer),
Rule::AnotherRemovedTestRule => {
test_rules::AnotherRemovedTestRule::diagnostic(locator, indexer)
test_rules::AnotherRemovedTestRule::diagnostic(locator, comment_ranges)
}
Rule::RedirectedToTestRule => {
test_rules::RedirectedToTestRule::diagnostic(locator, indexer)
test_rules::RedirectedToTestRule::diagnostic(locator, comment_ranges)
}
Rule::RedirectedFromTestRule => {
test_rules::RedirectedFromTestRule::diagnostic(locator, indexer)
test_rules::RedirectedFromTestRule::diagnostic(locator, comment_ranges)
}
Rule::RedirectedFromPrefixTestRule => {
test_rules::RedirectedFromPrefixTestRule::diagnostic(locator, indexer)
test_rules::RedirectedFromPrefixTestRule::diagnostic(locator, comment_ranges)
}
_ => unreachable!("All test rules must have an implementation"),
};
@ -288,7 +309,7 @@ pub fn check_path(
&mut diagnostics,
path,
locator,
indexer.comment_ranges(),
comment_ranges,
&directives.noqa_line_for,
error.is_none(),
&per_file_ignores,
@ -350,23 +371,21 @@ pub fn add_noqa_to_path(
source_type: PySourceType,
settings: &LinterSettings,
) -> Result<usize> {
let contents = source_kind.source_code();
// Tokenize once.
let tokens = ruff_python_parser::tokenize(contents, source_type.as_mode());
// Parse once.
let parsed = ruff_python_parser::parse_unchecked_source(source_kind.source_code(), source_type);
// Map row and column locations to byte slices (lazily).
let locator = Locator::new(contents);
let locator = Locator::new(source_kind.source_code());
// Detect the current code style (lazily).
let stylist = Stylist::from_tokens(&tokens, &locator);
let stylist = Stylist::from_tokens(parsed.tokens(), &locator);
// Extra indices from the code.
let indexer = Indexer::from_tokens(&tokens, &locator);
let indexer = Indexer::from_tokens(parsed.tokens(), &locator);
// Extract the `# noqa` and `# isort: skip` directives from the source.
let directives = directives::extract_directives(
&tokens,
&parsed,
directives::Flags::from_settings(settings),
&locator,
&indexer,
@ -387,7 +406,7 @@ pub fn add_noqa_to_path(
flags::Noqa::Disabled,
source_kind,
source_type,
TokenSource::Tokens(tokens),
&parsed,
);
// Log any parse errors.
@ -409,7 +428,7 @@ pub fn add_noqa_to_path(
path,
&diagnostics,
&locator,
indexer.comment_ranges(),
parsed.comment_ranges(),
&settings.external,
&directives.noqa_line_for,
stylist.line_ending(),
@ -425,23 +444,22 @@ pub fn lint_only(
noqa: flags::Noqa,
source_kind: &SourceKind,
source_type: PySourceType,
data: ParseSource,
source: ParseSource,
) -> LinterResult<Vec<Message>> {
// Tokenize once.
let tokens = data.into_token_source(source_kind, source_type);
let parsed = source.into_parsed(source_kind, source_type);
// Map row and column locations to byte slices (lazily).
let locator = Locator::new(source_kind.source_code());
// Detect the current code style (lazily).
let stylist = Stylist::from_tokens(&tokens, &locator);
let stylist = Stylist::from_tokens(parsed.tokens(), &locator);
// Extra indices from the code.
let indexer = Indexer::from_tokens(&tokens, &locator);
let indexer = Indexer::from_tokens(parsed.tokens(), &locator);
// Extract the `# noqa` and `# isort: skip` directives from the source.
let directives = directives::extract_directives(
&tokens,
&parsed,
directives::Flags::from_settings(settings),
&locator,
&indexer,
@ -459,7 +477,7 @@ pub fn lint_only(
noqa,
source_kind,
source_type,
tokens,
&parsed,
);
result.map(|diagnostics| diagnostics_to_messages(diagnostics, path, &locator, &directives))
@ -517,21 +535,22 @@ pub fn lint_fix<'a>(
// Continuously fix until the source code stabilizes.
loop {
// Tokenize once.
let tokens = ruff_python_parser::tokenize(transformed.source_code(), source_type.as_mode());
// Parse once.
let parsed =
ruff_python_parser::parse_unchecked_source(transformed.source_code(), source_type);
// Map row and column locations to byte slices (lazily).
let locator = Locator::new(transformed.source_code());
// Detect the current code style (lazily).
let stylist = Stylist::from_tokens(&tokens, &locator);
let stylist = Stylist::from_tokens(parsed.tokens(), &locator);
// Extra indices from the code.
let indexer = Indexer::from_tokens(&tokens, &locator);
let indexer = Indexer::from_tokens(parsed.tokens(), &locator);
// Extract the `# noqa` and `# isort: skip` directives from the source.
let directives = directives::extract_directives(
&tokens,
&parsed,
directives::Flags::from_settings(settings),
&locator,
&indexer,
@ -549,7 +568,7 @@ pub fn lint_fix<'a>(
noqa,
&transformed,
source_type,
TokenSource::Tokens(tokens),
&parsed,
);
if iterations == 0 {
@ -685,70 +704,21 @@ This indicates a bug in Ruff. If you could open an issue at:
#[derive(Debug, Clone)]
pub enum ParseSource {
/// Extract the tokens and AST from the given source code.
/// Parse the [`Parsed`] from the given source code.
None,
/// Use the precomputed tokens and AST.
Precomputed { tokens: Tokens, ast: Suite },
/// Use the precomputed [`Parsed`].
Precomputed(Parsed<ModModule>),
}
impl ParseSource {
/// Convert to a [`TokenSource`], tokenizing if necessary.
fn into_token_source(self, source_kind: &SourceKind, source_type: PySourceType) -> TokenSource {
/// Consumes the [`ParseSource`] and returns the parsed [`Parsed`], parsing the source code if
/// necessary.
fn into_parsed(self, source_kind: &SourceKind, source_type: PySourceType) -> Parsed<ModModule> {
match self {
Self::None => TokenSource::Tokens(ruff_python_parser::tokenize(
source_kind.source_code(),
source_type.as_mode(),
)),
Self::Precomputed { tokens, ast } => TokenSource::Precomputed { tokens, ast },
ParseSource::None => {
ruff_python_parser::parse_unchecked_source(source_kind.source_code(), source_type)
}
}
}
#[derive(Debug, Clone)]
pub enum TokenSource {
/// Use the precomputed tokens to generate the AST.
Tokens(Tokens),
/// Use the precomputed tokens and AST.
Precomputed { tokens: Tokens, ast: Suite },
}
impl TokenSource {
/// Returns an iterator over the [`TokenKind`] and the corresponding range.
///
/// [`TokenKind`]: ruff_python_parser::TokenKind
pub fn kinds(&self) -> TokenKindIter {
match self {
TokenSource::Tokens(tokens) => tokens.kinds(),
TokenSource::Precomputed { tokens, .. } => TokenKindIter::new(tokens),
}
}
}
impl Deref for TokenSource {
type Target = [LexResult];
fn deref(&self) -> &Self::Target {
match self {
Self::Tokens(tokens) => tokens,
Self::Precomputed { tokens, .. } => tokens,
}
}
}
impl TokenSource {
/// Convert to an [`AstSource`], parsing if necessary.
fn into_ast(
self,
source_kind: &SourceKind,
source_type: PySourceType,
) -> Result<Suite, ParseError> {
match self {
Self::Tokens(tokens) => Ok(ruff_python_parser::parse_program_tokens(
tokens,
source_kind.source_code(),
source_type.is_ipynb(),
)?),
Self::Precomputed { ast, .. } => Ok(ast),
ParseSource::Precomputed(parsed) => parsed,
}
}
}

View file

@ -4,7 +4,7 @@ use itertools::Itertools;
use once_cell::sync::Lazy;
use regex::{Regex, RegexSet};
use ruff_python_parser::parse_suite;
use ruff_python_parser::parse_module;
use ruff_python_trivia::{SimpleTokenKind, SimpleTokenizer};
use ruff_text_size::TextSize;
@ -84,7 +84,7 @@ pub(crate) fn comment_contains_code(line: &str, task_tags: &[String]) -> bool {
}
// Finally, compile the source code.
parse_suite(line).is_ok()
parse_module(line).is_ok()
}
#[cfg(test)]

View file

@ -1,6 +1,6 @@
use ruff_diagnostics::{Diagnostic, Edit, Fix, FixAvailability, Violation};
use ruff_macros::{derive_message_formats, violation};
use ruff_python_index::Indexer;
use ruff_python_trivia::CommentRanges;
use ruff_source_file::Locator;
use crate::settings::LinterSettings;
@ -47,14 +47,14 @@ impl Violation for CommentedOutCode {
pub(crate) fn commented_out_code(
diagnostics: &mut Vec<Diagnostic>,
locator: &Locator,
indexer: &Indexer,
comment_ranges: &CommentRanges,
settings: &LinterSettings,
) {
// Skip comments within `/// script` tags.
let mut in_script_tag = false;
// Iterate over all comments in the document.
for range in indexer.comment_ranges() {
for range in comment_ranges {
let line = locator.lines(*range);
// Detect `/// script` tags.

View file

@ -68,7 +68,7 @@ pub(crate) fn zip_without_explicit_strict(checker: &mut Checker, call: &ast::Exp
add_argument(
"strict=False",
&call.arguments,
checker.indexer().comment_ranges(),
checker.parsed().comment_ranges(),
checker.locator().contents(),
),
// If the function call contains `**kwargs`, mark the fix as unsafe.

View file

@ -2,7 +2,7 @@ use ruff_diagnostics::{AlwaysFixableViolation, Violation};
use ruff_diagnostics::{Diagnostic, Edit, Fix};
use ruff_macros::{derive_message_formats, violation};
use ruff_python_index::Indexer;
use ruff_python_parser::{TokenKind, TokenKindIter};
use ruff_python_parser::{TokenKind, Tokens};
use ruff_source_file::Locator;
use ruff_text_size::{Ranged, TextRange};
@ -27,31 +27,31 @@ enum TokenType {
/// Simplified token specialized for the task.
#[derive(Copy, Clone)]
struct Token {
struct SimpleToken {
ty: TokenType,
range: TextRange,
}
impl Ranged for Token {
impl Ranged for SimpleToken {
fn range(&self) -> TextRange {
self.range
}
}
impl Token {
impl SimpleToken {
fn new(ty: TokenType, range: TextRange) -> Self {
Self { ty, range }
}
fn irrelevant() -> Token {
Token {
fn irrelevant() -> SimpleToken {
SimpleToken {
ty: TokenType::Irrelevant,
range: TextRange::default(),
}
}
}
impl From<(TokenKind, TextRange)> for Token {
impl From<(TokenKind, TextRange)> for SimpleToken {
fn from((tok, range): (TokenKind, TextRange)) -> Self {
let ty = match tok {
TokenKind::Name => TokenType::Named,
@ -226,13 +226,13 @@ impl AlwaysFixableViolation for ProhibitedTrailingComma {
/// COM812, COM818, COM819
pub(crate) fn trailing_commas(
diagnostics: &mut Vec<Diagnostic>,
tokens: TokenKindIter,
tokens: &Tokens,
locator: &Locator,
indexer: &Indexer,
) {
let mut fstrings = 0u32;
let tokens = tokens.filter_map(|(token, tok_range)| {
match token {
let simple_tokens = tokens.up_to_first_unknown().iter().filter_map(|token| {
match token.kind() {
// Completely ignore comments -- they just interfere with the logic.
TokenKind::Comment => None,
// F-strings are handled as `String` token type with the complete range
@ -247,15 +247,15 @@ pub(crate) fn trailing_commas(
if fstrings == 0 {
indexer
.fstring_ranges()
.outermost(tok_range.start())
.map(|range| Token::new(TokenType::String, range))
.outermost(token.start())
.map(|range| SimpleToken::new(TokenType::String, range))
} else {
None
}
}
_ => {
if fstrings == 0 {
Some(Token::from((token, tok_range)))
Some(SimpleToken::from(token.as_tuple()))
} else {
None
}
@ -263,12 +263,12 @@ pub(crate) fn trailing_commas(
}
});
let mut prev = Token::irrelevant();
let mut prev_prev = Token::irrelevant();
let mut prev = SimpleToken::irrelevant();
let mut prev_prev = SimpleToken::irrelevant();
let mut stack = vec![Context::new(ContextType::No)];
for token in tokens {
for token in simple_tokens {
if prev.ty == TokenType::NonLogicalNewline && token.ty == TokenType::NonLogicalNewline {
// Collapse consecutive newlines to the first one -- trailing commas are
// added before the first newline.
@ -301,9 +301,9 @@ pub(crate) fn trailing_commas(
}
fn check_token(
token: Token,
prev: Token,
prev_prev: Token,
token: SimpleToken,
prev: SimpleToken,
prev_prev: SimpleToken,
context: Context,
locator: &Locator,
) -> Option<Diagnostic> {
@ -387,9 +387,9 @@ fn check_token(
}
fn update_context(
token: Token,
prev: Token,
prev_prev: Token,
token: SimpleToken,
prev: SimpleToken,
prev_prev: SimpleToken,
stack: &mut Vec<Context>,
) -> Context {
let new_context = match token.ty {

View file

@ -139,7 +139,7 @@ pub(crate) fn unnecessary_generator_list(checker: &mut Checker, call: &ast::Expr
let range = parenthesized_range(
argument.into(),
(&call.arguments).into(),
checker.indexer().comment_ranges(),
checker.parsed().comment_ranges(),
checker.locator().contents(),
)
.unwrap_or(argument.range());

View file

@ -1,7 +1,7 @@
use std::path::Path;
use ruff_diagnostics::Diagnostic;
use ruff_python_index::Indexer;
use ruff_python_trivia::CommentRanges;
use ruff_source_file::Locator;
pub(crate) use shebang_leading_whitespace::*;
pub(crate) use shebang_missing_executable_file::*;
@ -21,10 +21,10 @@ pub(crate) fn from_tokens(
diagnostics: &mut Vec<Diagnostic>,
path: &Path,
locator: &Locator,
indexer: &Indexer,
comment_ranges: &CommentRanges,
) {
let mut has_any_shebang = false;
for range in indexer.comment_ranges() {
for range in comment_ranges {
let comment = locator.slice(*range);
if let Some(shebang) = ShebangDirective::try_extract(comment) {
has_any_shebang = true;

View file

@ -4,9 +4,9 @@ use ruff_diagnostics::{Diagnostic, Edit, Fix, FixAvailability, Violation};
use ruff_macros::{derive_message_formats, violation};
use ruff_python_ast::str::{leading_quote, trailing_quote};
use ruff_python_index::Indexer;
use ruff_python_parser::{TokenKind, TokenKindIter};
use ruff_python_parser::{TokenKind, Tokens};
use ruff_source_file::Locator;
use ruff_text_size::TextRange;
use ruff_text_size::{Ranged, TextRange};
use crate::settings::LinterSettings;
@ -92,37 +92,39 @@ impl Violation for MultiLineImplicitStringConcatenation {
/// ISC001, ISC002
pub(crate) fn implicit(
diagnostics: &mut Vec<Diagnostic>,
tokens: TokenKindIter,
tokens: &Tokens,
settings: &LinterSettings,
locator: &Locator,
indexer: &Indexer,
) {
for ((a_tok, a_range), (b_tok, b_range)) in tokens
.filter(|(token, _)| {
*token != TokenKind::Comment
for (a_token, b_token) in tokens
.up_to_first_unknown()
.iter()
.filter(|token| {
token.kind() != TokenKind::Comment
&& (settings.flake8_implicit_str_concat.allow_multiline
|| *token != TokenKind::NonLogicalNewline)
|| token.kind() != TokenKind::NonLogicalNewline)
})
.tuple_windows()
{
let (a_range, b_range) = match (a_tok, b_tok) {
(TokenKind::String, TokenKind::String) => (a_range, b_range),
let (a_range, b_range) = match (a_token.kind(), b_token.kind()) {
(TokenKind::String, TokenKind::String) => (a_token.range(), b_token.range()),
(TokenKind::String, TokenKind::FStringStart) => {
match indexer.fstring_ranges().innermost(b_range.start()) {
Some(b_range) => (a_range, b_range),
match indexer.fstring_ranges().innermost(b_token.start()) {
Some(b_range) => (a_token.range(), b_range),
None => continue,
}
}
(TokenKind::FStringEnd, TokenKind::String) => {
match indexer.fstring_ranges().innermost(a_range.start()) {
Some(a_range) => (a_range, b_range),
match indexer.fstring_ranges().innermost(a_token.start()) {
Some(a_range) => (a_range, b_token.range()),
None => continue,
}
}
(TokenKind::FStringEnd, TokenKind::FStringStart) => {
match (
indexer.fstring_ranges().innermost(a_range.start()),
indexer.fstring_ranges().innermost(b_range.start()),
indexer.fstring_ranges().innermost(a_token.start()),
indexer.fstring_ranges().innermost(b_token.start()),
) {
(Some(a_range), Some(b_range)) => (a_range, b_range),
_ => continue,

View file

@ -2,7 +2,7 @@ use std::path::{Path, PathBuf};
use ruff_diagnostics::{Diagnostic, Violation};
use ruff_macros::{derive_message_formats, violation};
use ruff_python_index::Indexer;
use ruff_python_trivia::CommentRanges;
use ruff_source_file::Locator;
use ruff_text_size::{TextRange, TextSize};
@ -45,7 +45,7 @@ pub(crate) fn implicit_namespace_package(
path: &Path,
package: Option<&Path>,
locator: &Locator,
indexer: &Indexer,
comment_ranges: &CommentRanges,
project_root: &Path,
src: &[PathBuf],
) -> Option<Diagnostic> {
@ -61,8 +61,7 @@ pub(crate) fn implicit_namespace_package(
.parent()
.is_some_and( |parent| src.iter().any(|src| src == parent))
// Ignore files that contain a shebang.
&& !indexer
.comment_ranges()
&& !comment_ranges
.first().filter(|range| range.start() == TextSize::from(0))
.is_some_and(|range| ShebangDirective::try_extract(locator.slice(*range)).is_some())
{

View file

@ -129,7 +129,7 @@ pub(crate) fn unnecessary_dict_kwargs(checker: &mut Checker, call: &ast::ExprCal
parenthesized_range(
value.into(),
dict.into(),
checker.indexer().comment_ranges(),
checker.parsed().comment_ranges(),
checker.locator().contents(),
)
.unwrap_or(value.range())

View file

@ -114,7 +114,7 @@ fn generate_fix(
let insertion = add_argument(
locator.slice(generic_base),
arguments,
checker.indexer().comment_ranges(),
checker.parsed().comment_ranges(),
source,
);

View file

@ -1,6 +1,6 @@
use once_cell::sync::Lazy;
use regex::Regex;
use ruff_python_index::Indexer;
use ruff_python_trivia::CommentRanges;
use ruff_source_file::Locator;
use ruff_diagnostics::{Diagnostic, Violation};
@ -38,9 +38,9 @@ impl Violation for TypeCommentInStub {
pub(crate) fn type_comment_in_stub(
diagnostics: &mut Vec<Diagnostic>,
locator: &Locator,
indexer: &Indexer,
comment_ranges: &CommentRanges,
) {
for range in indexer.comment_ranges() {
for range in comment_ranges {
let comment = locator.slice(*range);
if TYPE_COMMENT_REGEX.is_match(comment) && !TYPE_IGNORE_REGEX.is_match(comment) {

View file

@ -284,7 +284,7 @@ pub(crate) fn unittest_assertion(
// the assertion is part of a larger expression.
if checker.semantic().current_statement().is_expr_stmt()
&& checker.semantic().current_expression_parent().is_none()
&& !checker.indexer().comment_ranges().intersects(expr.range())
&& !checker.parsed().comment_ranges().intersects(expr.range())
{
if let Ok(stmt) = unittest_assert.generate_assert(args, keywords) {
diagnostic.set_fix(Fix::unsafe_edit(Edit::range_replacement(
@ -292,7 +292,7 @@ pub(crate) fn unittest_assertion(
parenthesized_range(
expr.into(),
checker.semantic().current_statement().into(),
checker.indexer().comment_ranges(),
checker.parsed().comment_ranges(),
checker.locator().contents(),
)
.unwrap_or(expr.range()),
@ -385,7 +385,7 @@ pub(crate) fn unittest_raises_assertion(
call.func.range(),
);
if !checker
.indexer()
.parsed()
.comment_ranges()
.has_comments(call, checker.locator())
{
@ -745,7 +745,7 @@ pub(crate) fn composite_condition(
let mut diagnostic = Diagnostic::new(PytestCompositeAssertion, stmt.range());
if matches!(composite, CompositionKind::Simple)
&& msg.is_none()
&& !checker.indexer().comment_ranges().intersects(stmt.range())
&& !checker.parsed().comment_ranges().intersects(stmt.range())
&& !checker
.indexer()
.in_multi_statement_line(stmt, checker.locator())

View file

@ -353,7 +353,7 @@ fn check_names(checker: &mut Checker, decorator: &Decorator, expr: &Expr) {
let name_range = get_parametrize_name_range(
decorator,
expr,
checker.indexer().comment_ranges(),
checker.parsed().comment_ranges(),
checker.locator().contents(),
)
.unwrap_or(expr.range());
@ -388,7 +388,7 @@ fn check_names(checker: &mut Checker, decorator: &Decorator, expr: &Expr) {
let name_range = get_parametrize_name_range(
decorator,
expr,
checker.indexer().comment_ranges(),
checker.parsed().comment_ranges(),
checker.locator().contents(),
)
.unwrap_or(expr.range());
@ -681,11 +681,7 @@ fn check_duplicates(checker: &mut Checker, values: &Expr) {
let element_end =
trailing_comma(element, checker.locator().contents(), values_end);
let deletion_range = TextRange::new(previous_end, element_end);
if !checker
.indexer()
.comment_ranges()
.intersects(deletion_range)
{
if !checker.parsed().comment_ranges().intersects(deletion_range) {
diagnostic.set_fix(Fix::unsafe_edit(Edit::range_deletion(deletion_range)));
}
}

View file

@ -527,7 +527,7 @@ pub(crate) fn compare_with_tuple(checker: &mut Checker, expr: &Expr) {
// Avoid removing comments.
if checker
.indexer()
.parsed()
.comment_ranges()
.has_comments(expr, checker.locator())
{
@ -779,7 +779,7 @@ fn is_short_circuit(
parenthesized_range(
furthest.into(),
expr.into(),
checker.indexer().comment_ranges(),
checker.parsed().comment_ranges(),
checker.locator().contents(),
)
.unwrap_or(furthest.range())
@ -807,7 +807,7 @@ fn is_short_circuit(
parenthesized_range(
furthest.into(),
expr.into(),
checker.indexer().comment_ranges(),
checker.parsed().comment_ranges(),
checker.locator().contents(),
)
.unwrap_or(furthest.range())

View file

@ -164,7 +164,7 @@ pub(crate) fn if_expr_with_true_false(
parenthesized_range(
test.into(),
expr.into(),
checker.indexer().comment_ranges(),
checker.parsed().comment_ranges(),
checker.locator().contents(),
)
.unwrap_or(test.range()),

View file

@ -168,7 +168,7 @@ pub(crate) fn multiple_with_statements(
TextRange::new(with_stmt.start(), colon.end()),
);
if !checker
.indexer()
.parsed()
.comment_ranges()
.intersects(TextRange::new(with_stmt.start(), with_stmt.body[0].start()))
{

View file

@ -113,14 +113,10 @@ pub(crate) fn nested_if_statements(
);
// The fixer preserves comments in the nested body, but removes comments between
// the outer and inner if statements.
if !checker
.indexer()
.comment_ranges()
.intersects(TextRange::new(
if !checker.parsed().comment_ranges().intersects(TextRange::new(
nested_if.start(),
nested_if.body()[0].start(),
))
{
)) {
match collapse_nested_if(checker.locator(), checker.stylist(), nested_if) {
Ok(edit) => {
if edit.content().map_or(true, |content| {

View file

@ -210,7 +210,7 @@ pub(crate) fn if_else_block_instead_of_dict_get(checker: &mut Checker, stmt_if:
stmt_if.range(),
);
if !checker
.indexer()
.parsed()
.comment_ranges()
.has_comments(stmt_if, checker.locator())
{
@ -300,7 +300,7 @@ pub(crate) fn if_exp_instead_of_dict_get(
expr.range(),
);
if !checker
.indexer()
.parsed()
.comment_ranges()
.has_comments(expr, checker.locator())
{

View file

@ -143,7 +143,7 @@ pub(crate) fn if_else_block_instead_of_if_exp(checker: &mut Checker, stmt_if: &a
stmt_if.range(),
);
if !checker
.indexer()
.parsed()
.comment_ranges()
.has_comments(stmt_if, checker.locator())
{

View file

@ -8,8 +8,7 @@ use ruff_python_ast::comparable::ComparableStmt;
use ruff_python_ast::parenthesize::parenthesized_range;
use ruff_python_ast::stmt_if::{if_elif_branches, IfElifBranch};
use ruff_python_ast::{self as ast, Expr};
use ruff_python_index::Indexer;
use ruff_python_trivia::{SimpleTokenKind, SimpleTokenizer};
use ruff_python_trivia::{CommentRanges, SimpleTokenKind, SimpleTokenizer};
use ruff_source_file::Locator;
use ruff_text_size::{Ranged, TextRange};
@ -74,13 +73,13 @@ pub(crate) fn if_with_same_arms(checker: &mut Checker, stmt_if: &ast::StmtIf) {
// ...and the same comments
let first_comments = checker
.indexer()
.parsed()
.comment_ranges()
.comments_in_range(body_range(&current_branch, checker.locator()))
.iter()
.map(|range| checker.locator().slice(*range));
let second_comments = checker
.indexer()
.parsed()
.comment_ranges()
.comments_in_range(body_range(following_branch, checker.locator()))
.iter()
@ -100,7 +99,7 @@ pub(crate) fn if_with_same_arms(checker: &mut Checker, stmt_if: &ast::StmtIf) {
&current_branch,
following_branch,
checker.locator(),
checker.indexer(),
checker.parsed().comment_ranges(),
)
});
@ -114,7 +113,7 @@ fn merge_branches(
current_branch: &IfElifBranch,
following_branch: &IfElifBranch,
locator: &Locator,
indexer: &Indexer,
comment_ranges: &CommentRanges,
) -> Result<Fix> {
// Identify the colon (`:`) at the end of the current branch's test.
let Some(current_branch_colon) =
@ -133,7 +132,7 @@ fn merge_branches(
let following_branch_test = if let Some(range) = parenthesized_range(
following_branch.test.into(),
stmt_if.into(),
indexer.comment_ranges(),
comment_ranges,
locator.contents(),
) {
Cow::Borrowed(locator.slice(range))

View file

@ -100,14 +100,14 @@ fn key_in_dict(
let left_range = parenthesized_range(
left.into(),
parent,
checker.indexer().comment_ranges(),
checker.parsed().comment_ranges(),
checker.locator().contents(),
)
.unwrap_or(left.range());
let right_range = parenthesized_range(
right.into(),
parent,
checker.indexer().comment_ranges(),
checker.parsed().comment_ranges(),
checker.locator().contents(),
)
.unwrap_or(right.range());

View file

@ -194,7 +194,7 @@ pub(crate) fn needless_bool(checker: &mut Checker, stmt: &Stmt) {
// Generate the replacement condition.
let condition = if checker
.indexer()
.parsed()
.comment_ranges()
.has_comments(&range, checker.locator())
{

View file

@ -126,7 +126,7 @@ pub(crate) fn suppressible_exception(
stmt.range(),
);
if !checker
.indexer()
.parsed()
.comment_ranges()
.has_comments(stmt, checker.locator())
{

View file

@ -1,6 +1,6 @@
use once_cell::sync::Lazy;
use regex::RegexSet;
use ruff_python_index::Indexer;
use ruff_python_trivia::CommentRanges;
use ruff_source_file::Locator;
use ruff_text_size::{TextLen, TextRange, TextSize};
@ -235,7 +235,7 @@ pub(crate) fn todos(
diagnostics: &mut Vec<Diagnostic>,
todo_comments: &[TodoComment],
locator: &Locator,
indexer: &Indexer,
comment_ranges: &CommentRanges,
) {
for todo_comment in todo_comments {
let TodoComment {
@ -256,12 +256,7 @@ pub(crate) fn todos(
let mut has_issue_link = false;
let mut curr_range = range;
for next_range in indexer
.comment_ranges()
.iter()
.skip(range_index + 1)
.copied()
{
for next_range in comment_ranges.iter().skip(range_index + 1).copied() {
// Ensure that next_comment_range is in the same multiline comment "block" as
// comment_range.
if !locator

View file

@ -491,7 +491,6 @@ fn fix_imports(checker: &Checker, node_id: NodeId, imports: &[ImportBinding]) ->
},
at,
checker.semantic(),
checker.source_type,
)?
.into_edits();

View file

@ -1,4 +1,5 @@
use ruff_python_ast::{self as ast, PySourceType, Stmt};
use ruff_python_ast::{self as ast, Stmt};
use ruff_python_parser::Tokens;
use ruff_text_size::{Ranged, TextRange};
use ruff_source_file::Locator;
@ -13,7 +14,7 @@ pub(crate) fn annotate_imports<'a>(
comments: Vec<Comment<'a>>,
locator: &Locator<'a>,
split_on_trailing_comma: bool,
source_type: PySourceType,
tokens: &Tokens,
) -> Vec<AnnotatedImport<'a>> {
let mut comments_iter = comments.into_iter().peekable();
@ -120,7 +121,7 @@ pub(crate) fn annotate_imports<'a>(
names: aliases,
level: *level,
trailing_comma: if split_on_trailing_comma {
trailing_comma(import, locator, source_type)
trailing_comma(import, tokens)
} else {
TrailingComma::default()
},

View file

@ -1,6 +1,6 @@
use std::borrow::Cow;
use ruff_python_index::Indexer;
use ruff_python_trivia::CommentRanges;
use ruff_source_file::Locator;
use ruff_text_size::{Ranged, TextRange};
@ -20,10 +20,9 @@ impl Ranged for Comment<'_> {
pub(crate) fn collect_comments<'a>(
range: TextRange,
locator: &'a Locator,
indexer: &'a Indexer,
comment_ranges: &'a CommentRanges,
) -> Vec<Comment<'a>> {
indexer
.comment_ranges()
comment_ranges
.comments_in_range(range)
.iter()
.map(|range| Comment {

View file

@ -1,5 +1,5 @@
use ruff_python_ast::{PySourceType, Stmt};
use ruff_python_parser::{lexer, AsMode, Tok};
use ruff_python_ast::Stmt;
use ruff_python_parser::{TokenKind, Tokens};
use ruff_python_trivia::PythonWhitespace;
use ruff_source_file::{Locator, UniversalNewlines};
use ruff_text_size::Ranged;
@ -8,31 +8,23 @@ use crate::rules::isort::types::TrailingComma;
/// Return `true` if a `Stmt::ImportFrom` statement ends with a magic
/// trailing comma.
pub(super) fn trailing_comma(
stmt: &Stmt,
locator: &Locator,
source_type: PySourceType,
) -> TrailingComma {
let contents = locator.slice(stmt);
pub(super) fn trailing_comma(stmt: &Stmt, tokens: &Tokens) -> TrailingComma {
let mut count = 0u32;
let mut trailing_comma = TrailingComma::Absent;
for (tok, _) in lexer::lex_starts_at(contents, source_type.as_mode(), stmt.start()).flatten() {
if matches!(tok, Tok::Lpar) {
count = count.saturating_add(1);
}
if matches!(tok, Tok::Rpar) {
count = count.saturating_sub(1);
for token in tokens.in_range(stmt.range()) {
match token.kind() {
TokenKind::Lpar => count = count.saturating_add(1),
TokenKind::Rpar => count = count.saturating_sub(1),
_ => {}
}
if count == 1 {
if matches!(
tok,
Tok::NonLogicalNewline | Tok::Indent | Tok::Dedent | Tok::Comment(_)
) {
continue;
} else if matches!(tok, Tok::Comma) {
trailing_comma = TrailingComma::Present;
} else {
trailing_comma = TrailingComma::Absent;
match token.kind() {
TokenKind::NonLogicalNewline
| TokenKind::Indent
| TokenKind::Dedent
| TokenKind::Comment => continue,
TokenKind::Comma => trailing_comma = TrailingComma::Present,
_ => trailing_comma = TrailingComma::Absent,
}
}
}

View file

@ -12,6 +12,7 @@ use normalize::normalize_imports;
use order::order_imports;
use ruff_python_ast::PySourceType;
use ruff_python_codegen::Stylist;
use ruff_python_parser::Tokens;
use ruff_source_file::Locator;
use settings::Settings;
use types::EitherImport::{Import, ImportFrom};
@ -72,6 +73,7 @@ pub(crate) fn format_imports(
source_type: PySourceType,
target_version: PythonVersion,
settings: &Settings,
tokens: &Tokens,
) -> String {
let trailer = &block.trailer;
let block = annotate_imports(
@ -79,7 +81,7 @@ pub(crate) fn format_imports(
comments,
locator,
settings.split_on_trailing_comma,
source_type,
tokens,
);
// Normalize imports (i.e., deduplicate, aggregate `from` imports).

View file

@ -4,9 +4,9 @@ use ruff_diagnostics::{AlwaysFixableViolation, Diagnostic, Fix};
use ruff_macros::{derive_message_formats, violation};
use ruff_python_ast::helpers::is_docstring_stmt;
use ruff_python_ast::imports::{Alias, AnyImport, FutureImport, Import, ImportFrom};
use ruff_python_ast::{self as ast, PySourceType, Stmt, Suite};
use ruff_python_ast::{self as ast, ModModule, PySourceType, Stmt};
use ruff_python_codegen::Stylist;
use ruff_python_parser::parse_suite;
use ruff_python_parser::{parse_module, Parsed};
use ruff_source_file::Locator;
use ruff_text_size::{TextRange, TextSize};
@ -87,13 +87,13 @@ fn includes_import(stmt: &Stmt, target: &AnyImport) -> bool {
#[allow(clippy::too_many_arguments)]
fn add_required_import(
required_import: &AnyImport,
python_ast: &Suite,
parsed: &Parsed<ModModule>,
locator: &Locator,
stylist: &Stylist,
source_type: PySourceType,
) -> Option<Diagnostic> {
// Don't add imports to semantically-empty files.
if python_ast.iter().all(is_docstring_stmt) {
if parsed.suite().iter().all(is_docstring_stmt) {
return None;
}
@ -103,7 +103,8 @@ fn add_required_import(
}
// If the import is already present in a top-level block, don't add it.
if python_ast
if parsed
.suite()
.iter()
.any(|stmt| includes_import(stmt, required_import))
{
@ -116,15 +117,14 @@ fn add_required_import(
TextRange::default(),
);
diagnostic.set_fix(Fix::safe_edit(
Importer::new(python_ast, locator, stylist)
.add_import(required_import, TextSize::default()),
Importer::new(parsed, locator, stylist).add_import(required_import, TextSize::default()),
));
Some(diagnostic)
}
/// I002
pub(crate) fn add_required_imports(
python_ast: &Suite,
parsed: &Parsed<ModModule>,
locator: &Locator,
stylist: &Stylist,
settings: &LinterSettings,
@ -135,7 +135,7 @@ pub(crate) fn add_required_imports(
.required_imports
.iter()
.flat_map(|required_import| {
let Ok(body) = parse_suite(required_import) else {
let Ok(body) = parse_module(required_import).map(Parsed::into_suite) else {
error!("Failed to parse required import: `{}`", required_import);
return vec![];
};
@ -165,7 +165,7 @@ pub(crate) fn add_required_imports(
},
level: *level,
}),
python_ast,
parsed,
locator,
stylist,
source_type,
@ -182,7 +182,7 @@ pub(crate) fn add_required_imports(
as_name: name.asname.as_deref(),
},
}),
python_ast,
parsed,
locator,
stylist,
source_type,

View file

@ -5,9 +5,10 @@ use itertools::{EitherOrBoth, Itertools};
use ruff_diagnostics::{Diagnostic, Edit, Fix, FixAvailability, Violation};
use ruff_macros::{derive_message_formats, violation};
use ruff_python_ast::whitespace::trailing_lines_end;
use ruff_python_ast::{PySourceType, Stmt};
use ruff_python_ast::{ModModule, PySourceType, Stmt};
use ruff_python_codegen::Stylist;
use ruff_python_index::Indexer;
use ruff_python_parser::Parsed;
use ruff_python_trivia::{leading_indentation, textwrap::indent, PythonWhitespace};
use ruff_source_file::{Locator, UniversalNewlines};
use ruff_text_size::{Ranged, TextRange};
@ -78,7 +79,7 @@ fn matches_ignoring_indentation(val1: &str, val2: &str) -> bool {
})
}
#[allow(clippy::cast_sign_loss)]
#[allow(clippy::cast_sign_loss, clippy::too_many_arguments)]
/// I001
pub(crate) fn organize_imports(
block: &Block,
@ -88,6 +89,7 @@ pub(crate) fn organize_imports(
settings: &LinterSettings,
package: Option<&Path>,
source_type: PySourceType,
parsed: &Parsed<ModModule>,
) -> Option<Diagnostic> {
let indentation = locator.slice(extract_indentation_range(&block.imports, locator));
let indentation = leading_indentation(indentation);
@ -106,7 +108,7 @@ pub(crate) fn organize_imports(
let comments = comments::collect_comments(
TextRange::new(range.start(), locator.full_line_end(range.end())),
locator,
indexer,
parsed.comment_ranges(),
);
let trailing_line_end = if block.trailer.is_none() {
@ -128,6 +130,7 @@ pub(crate) fn organize_imports(
source_type,
settings.target_version,
&settings.isort,
parsed.tokens(),
);
// Expand the span the entire range, including leading and trailing space.

View file

@ -177,10 +177,15 @@ pub(crate) fn function_is_too_complex(
mod tests {
use anyhow::Result;
use ruff_python_parser::parse_suite;
use ruff_python_ast::Suite;
use ruff_python_parser::parse_module;
use super::get_complexity_number;
fn parse_suite(source: &str) -> Result<Suite> {
Ok(parse_module(source)?.into_suite())
}
#[test]
fn trivial() -> Result<()> {
let source = r"

View file

@ -93,7 +93,7 @@ pub(crate) fn inplace_argument(checker: &mut Checker, call: &ast::ExprCall) {
call,
keyword,
statement,
checker.indexer().comment_ranges(),
checker.parsed().comment_ranges(),
checker.locator(),
) {
diagnostic.set_fix(fix);

View file

@ -2,8 +2,7 @@ use std::ops::Deref;
use unicode_width::UnicodeWidthStr;
use ruff_python_index::Indexer;
use ruff_python_trivia::is_pragma_comment;
use ruff_python_trivia::{is_pragma_comment, CommentRanges};
use ruff_source_file::Line;
use ruff_text_size::{TextLen, TextRange};
@ -20,7 +19,7 @@ impl Overlong {
/// otherwise.
pub(super) fn try_from_line(
line: &Line,
indexer: &Indexer,
comment_ranges: &CommentRanges,
limit: LineLength,
task_tags: &[String],
tab_size: IndentWidth,
@ -40,7 +39,7 @@ impl Overlong {
}
// Strip trailing comments and re-measure the line, if needed.
let line = StrippedLine::from_line(line, indexer, task_tags);
let line = StrippedLine::from_line(line, comment_ranges, task_tags);
let width = match &line {
StrippedLine::WithoutPragma(line) => {
let width = measure(line.as_str(), tab_size);
@ -119,8 +118,8 @@ enum StrippedLine<'a> {
impl<'a> StrippedLine<'a> {
/// Strip trailing comments from a [`Line`], if the line ends with a pragma comment (like
/// `# type: ignore`) or, if necessary, a task comment (like `# TODO`).
fn from_line(line: &'a Line<'a>, indexer: &Indexer, task_tags: &[String]) -> Self {
let [comment_range] = indexer.comment_ranges().comments_in_range(line.range()) else {
fn from_line(line: &'a Line<'a>, comment_ranges: &CommentRanges, task_tags: &[String]) -> Self {
let [comment_range] = comment_ranges.comments_in_range(line.range()) else {
return Self::Unchanged(line);
};

View file

@ -1,5 +1,7 @@
use itertools::Itertools;
use ruff_notebook::CellOffsets;
use ruff_python_parser::Token;
use ruff_python_parser::Tokens;
use std::cmp::Ordering;
use std::iter::Peekable;
use std::num::NonZeroU32;
@ -12,7 +14,7 @@ use ruff_diagnostics::Fix;
use ruff_macros::{derive_message_formats, violation};
use ruff_python_ast::PySourceType;
use ruff_python_codegen::Stylist;
use ruff_python_parser::{TokenKind, TokenKindIter};
use ruff_python_parser::TokenKind;
use ruff_source_file::{Locator, UniversalNewlines};
use ruff_text_size::TextRange;
use ruff_text_size::TextSize;
@ -381,7 +383,7 @@ struct LogicalLineInfo {
/// Iterator that processes tokens until a full logical line (or comment line) is "built".
/// It then returns characteristics of that logical line (see `LogicalLineInfo`).
struct LinePreprocessor<'a> {
tokens: TokenKindIter<'a>,
tokens: Peekable<Iter<'a, Token>>,
locator: &'a Locator<'a>,
indent_width: IndentWidth,
/// The start position of the next logical line.
@ -397,13 +399,13 @@ struct LinePreprocessor<'a> {
impl<'a> LinePreprocessor<'a> {
fn new(
tokens: TokenKindIter<'a>,
tokens: &'a Tokens,
locator: &'a Locator,
indent_width: IndentWidth,
cell_offsets: Option<&'a CellOffsets>,
) -> LinePreprocessor<'a> {
LinePreprocessor {
tokens,
tokens: tokens.up_to_first_unknown().iter().peekable(),
locator,
line_start: TextSize::new(0),
max_preceding_blank_lines: BlankLines::Zero,
@ -424,17 +426,17 @@ impl<'a> Iterator for LinePreprocessor<'a> {
// Number of consecutive blank lines directly preceding this logical line.
let mut blank_lines = BlankLines::Zero;
let mut first_logical_line_token: Option<(LogicalLineKind, TextRange)> = None;
let mut last_token: TokenKind = TokenKind::EndOfFile;
let mut last_token = TokenKind::EndOfFile;
let mut parens = 0u32;
while let Some((token, range)) = self.tokens.next() {
if matches!(token, TokenKind::Indent | TokenKind::Dedent) {
while let Some(token) = self.tokens.next() {
let (kind, range) = token.as_tuple();
if matches!(kind, TokenKind::Indent | TokenKind::Dedent) {
continue;
}
let (logical_line_kind, first_token_range) = if let Some(first_token_range) =
first_logical_line_token
{
let (logical_line_kind, first_token_range) =
if let Some(first_token_range) = first_logical_line_token {
first_token_range
}
// At the start of the line...
@ -453,7 +455,7 @@ impl<'a> Iterator for LinePreprocessor<'a> {
}
// An empty line
if token == TokenKind::NonLogicalNewline {
if kind == TokenKind::NonLogicalNewline {
blank_lines.add(range);
self.line_start = range.end();
@ -461,15 +463,20 @@ impl<'a> Iterator for LinePreprocessor<'a> {
continue;
}
is_docstring = token == TokenKind::String;
is_docstring = kind == TokenKind::String;
let logical_line_kind = match token {
let logical_line_kind = match kind {
TokenKind::Class => LogicalLineKind::Class,
TokenKind::Comment => LogicalLineKind::Comment,
TokenKind::At => LogicalLineKind::Decorator,
TokenKind::Def => LogicalLineKind::Function,
// Lookahead to distinguish `async def` from `async with`.
TokenKind::Async if matches!(self.tokens.peek(), Some((TokenKind::Def, _))) => {
TokenKind::Async
if self
.tokens
.peek()
.is_some_and(|token| token.kind() == TokenKind::Def) =>
{
LogicalLineKind::Function
}
TokenKind::Import => LogicalLineKind::Import,
@ -482,17 +489,17 @@ impl<'a> Iterator for LinePreprocessor<'a> {
(logical_line_kind, range)
};
if !token.is_trivia() {
if !kind.is_trivia() {
line_is_comment_only = false;
}
// A docstring line is composed only of the docstring (TokenKind::String) and trivia tokens.
// (If a comment follows a docstring, we still count the line as a docstring)
if token != TokenKind::String && !token.is_trivia() {
if kind != TokenKind::String && !kind.is_trivia() {
is_docstring = false;
}
match token {
match kind {
TokenKind::Lbrace | TokenKind::Lpar | TokenKind::Lsqb => {
parens = parens.saturating_add(1);
}
@ -538,8 +545,8 @@ impl<'a> Iterator for LinePreprocessor<'a> {
_ => {}
}
if !token.is_trivia() {
last_token = token;
if !kind.is_trivia() {
last_token = kind;
}
}
@ -722,7 +729,7 @@ impl<'a> BlankLinesChecker<'a> {
}
/// E301, E302, E303, E304, E305, E306
pub(crate) fn check_lines(&self, tokens: TokenKindIter<'a>, diagnostics: &mut Vec<Diagnostic>) {
pub(crate) fn check_lines(&self, tokens: &Tokens, diagnostics: &mut Vec<Diagnostic>) {
let mut prev_indent_length: Option<usize> = None;
let mut state = BlankLinesState::default();
let line_preprocessor =

View file

@ -1,7 +1,9 @@
use std::slice::Iter;
use ruff_notebook::CellOffsets;
use ruff_python_ast::PySourceType;
use ruff_python_parser::{TokenKind, TokenKindIter};
use ruff_text_size::{TextRange, TextSize};
use ruff_python_parser::{Token, TokenKind, Tokens};
use ruff_text_size::{Ranged, TextSize};
use ruff_diagnostics::{AlwaysFixableViolation, Violation};
use ruff_diagnostics::{Diagnostic, Edit, Fix};
@ -99,7 +101,7 @@ impl AlwaysFixableViolation for UselessSemicolon {
/// E701, E702, E703
pub(crate) fn compound_statements(
diagnostics: &mut Vec<Diagnostic>,
mut tokens: TokenKindIter,
tokens: &Tokens,
locator: &Locator,
indexer: &Indexer,
source_type: PySourceType,
@ -125,33 +127,26 @@ pub(crate) fn compound_statements(
// This is used to allow `class C: ...`-style definitions in stubs.
let mut allow_ellipsis = false;
// Track the bracket depth.
let mut par_count = 0u32;
let mut sqb_count = 0u32;
let mut brace_count = 0u32;
// Track the nesting level.
let mut nesting = 0u32;
// Track indentation.
let mut indent = 0u32;
while let Some((token, range)) = tokens.next() {
match token {
TokenKind::Lpar => {
par_count = par_count.saturating_add(1);
// Use an iterator to allow passing it around.
let mut token_iter = tokens.up_to_first_unknown().iter();
loop {
let Some(token) = token_iter.next() else {
break;
};
match token.kind() {
TokenKind::Lpar | TokenKind::Lsqb | TokenKind::Lbrace => {
nesting = nesting.saturating_add(1);
}
TokenKind::Rpar => {
par_count = par_count.saturating_sub(1);
}
TokenKind::Lsqb => {
sqb_count = sqb_count.saturating_add(1);
}
TokenKind::Rsqb => {
sqb_count = sqb_count.saturating_sub(1);
}
TokenKind::Lbrace => {
brace_count = brace_count.saturating_add(1);
}
TokenKind::Rbrace => {
brace_count = brace_count.saturating_sub(1);
TokenKind::Rpar | TokenKind::Rsqb | TokenKind::Rbrace => {
nesting = nesting.saturating_sub(1);
}
TokenKind::Ellipsis => {
if allow_ellipsis {
@ -168,28 +163,27 @@ pub(crate) fn compound_statements(
_ => {}
}
if par_count > 0 || sqb_count > 0 || brace_count > 0 {
if nesting > 0 {
continue;
}
match token {
match token.kind() {
TokenKind::Newline => {
if let Some((start, end)) = semi {
if let Some(range) = semi {
if !(source_type.is_ipynb()
&& indent == 0
&& cell_offsets
.and_then(|cell_offsets| cell_offsets.containing_range(range.start()))
.and_then(|cell_offsets| cell_offsets.containing_range(token.start()))
.is_some_and(|cell_range| {
!has_non_trivia_tokens_till(tokens.clone(), cell_range.end())
!has_non_trivia_tokens_till(token_iter.clone(), cell_range.end())
}))
{
let mut diagnostic =
Diagnostic::new(UselessSemicolon, TextRange::new(start, end));
let mut diagnostic = Diagnostic::new(UselessSemicolon, range);
diagnostic.set_fix(Fix::safe_edit(Edit::deletion(
indexer
.preceded_by_continuations(start, locator)
.unwrap_or(start),
end,
.preceded_by_continuations(range.start(), locator)
.unwrap_or(range.start()),
range.end(),
)));
diagnostics.push(diagnostic);
}
@ -225,14 +219,14 @@ pub(crate) fn compound_statements(
|| while_.is_some()
|| with.is_some()
{
colon = Some((range.start(), range.end()));
colon = Some(token.range());
// Allow `class C: ...`-style definitions.
allow_ellipsis = true;
}
}
TokenKind::Semi => {
semi = Some((range.start(), range.end()));
semi = Some(token.range());
allow_ellipsis = false;
}
TokenKind::Comment
@ -240,22 +234,16 @@ pub(crate) fn compound_statements(
| TokenKind::Dedent
| TokenKind::NonLogicalNewline => {}
_ => {
if let Some((start, end)) = semi {
diagnostics.push(Diagnostic::new(
MultipleStatementsOnOneLineSemicolon,
TextRange::new(start, end),
));
if let Some(range) = semi {
diagnostics.push(Diagnostic::new(MultipleStatementsOnOneLineSemicolon, range));
// Reset.
semi = None;
allow_ellipsis = false;
}
if let Some((start, end)) = colon {
diagnostics.push(Diagnostic::new(
MultipleStatementsOnOneLineColon,
TextRange::new(start, end),
));
if let Some(range) = colon {
diagnostics.push(Diagnostic::new(MultipleStatementsOnOneLineColon, range));
// Reset.
colon = None;
@ -276,7 +264,7 @@ pub(crate) fn compound_statements(
}
}
match token {
match token.kind() {
TokenKind::Lambda => {
// Reset.
colon = None;
@ -294,40 +282,40 @@ pub(crate) fn compound_statements(
with = None;
}
TokenKind::Case => {
case = Some((range.start(), range.end()));
case = Some(token.range());
}
TokenKind::If => {
if_ = Some((range.start(), range.end()));
if_ = Some(token.range());
}
TokenKind::While => {
while_ = Some((range.start(), range.end()));
while_ = Some(token.range());
}
TokenKind::For => {
for_ = Some((range.start(), range.end()));
for_ = Some(token.range());
}
TokenKind::Try => {
try_ = Some((range.start(), range.end()));
try_ = Some(token.range());
}
TokenKind::Except => {
except = Some((range.start(), range.end()));
except = Some(token.range());
}
TokenKind::Finally => {
finally = Some((range.start(), range.end()));
finally = Some(token.range());
}
TokenKind::Elif => {
elif = Some((range.start(), range.end()));
elif = Some(token.range());
}
TokenKind::Else => {
else_ = Some((range.start(), range.end()));
else_ = Some(token.range());
}
TokenKind::Class => {
class = Some((range.start(), range.end()));
class = Some(token.range());
}
TokenKind::With => {
with = Some((range.start(), range.end()));
with = Some(token.range());
}
TokenKind::Match => {
match_ = Some((range.start(), range.end()));
match_ = Some(token.range());
}
_ => {}
};
@ -336,13 +324,13 @@ pub(crate) fn compound_statements(
/// Returns `true` if there are any non-trivia tokens from the given token
/// iterator till the given end offset.
fn has_non_trivia_tokens_till(tokens: TokenKindIter, cell_end: TextSize) -> bool {
for (token, tok_range) in tokens {
if tok_range.start() >= cell_end {
fn has_non_trivia_tokens_till(tokens: Iter<'_, Token>, cell_end: TextSize) -> bool {
for token in tokens {
if token.start() >= cell_end {
return false;
}
if !matches!(
token,
token.kind(),
TokenKind::Newline
| TokenKind::Comment
| TokenKind::EndOfFile

View file

@ -1,6 +1,6 @@
use ruff_diagnostics::{Diagnostic, Violation};
use ruff_macros::{derive_message_formats, violation};
use ruff_python_index::Indexer;
use ruff_python_trivia::CommentRanges;
use ruff_source_file::Line;
use crate::rules::pycodestyle::overlong::Overlong;
@ -84,13 +84,13 @@ impl Violation for DocLineTooLong {
/// W505
pub(crate) fn doc_line_too_long(
line: &Line,
indexer: &Indexer,
comment_ranges: &CommentRanges,
settings: &LinterSettings,
) -> Option<Diagnostic> {
let limit = settings.pycodestyle.max_doc_length?;
Overlong::try_from_line(
line,
indexer,
comment_ranges,
limit,
if settings.pycodestyle.ignore_overlong_task_comments {
&settings.task_tags

View file

@ -1,6 +1,6 @@
use ruff_diagnostics::{Diagnostic, Violation};
use ruff_macros::{derive_message_formats, violation};
use ruff_python_index::Indexer;
use ruff_python_trivia::CommentRanges;
use ruff_source_file::Line;
use crate::rules::pycodestyle::overlong::Overlong;
@ -82,14 +82,14 @@ impl Violation for LineTooLong {
/// E501
pub(crate) fn line_too_long(
line: &Line,
indexer: &Indexer,
comment_ranges: &CommentRanges,
settings: &LinterSettings,
) -> Option<Diagnostic> {
let limit = settings.pycodestyle.max_line_length;
Overlong::try_from_line(
line,
indexer,
comment_ranges,
limit,
if settings.pycodestyle.ignore_overlong_task_comments {
&settings.task_tags

View file

@ -324,7 +324,7 @@ pub(crate) fn literal_comparisons(checker: &mut Checker, compare: &ast::ExprComp
&ops,
&compare.comparators,
compare.into(),
checker.indexer().comment_ranges(),
checker.parsed().comment_ranges(),
checker.locator(),
);
for diagnostic in &mut diagnostics {

View file

@ -14,10 +14,9 @@ use std::fmt::{Debug, Formatter};
use std::iter::FusedIterator;
use bitflags::bitflags;
use ruff_python_parser::lexer::LexResult;
use ruff_text_size::{Ranged, TextLen, TextRange, TextSize};
use ruff_python_parser::TokenKind;
use ruff_python_parser::{TokenKind, Tokens};
use ruff_python_trivia::is_python_whitespace;
use ruff_source_file::Locator;
@ -60,17 +59,16 @@ pub(crate) struct LogicalLines<'a> {
}
impl<'a> LogicalLines<'a> {
pub(crate) fn from_tokens(tokens: &'a [LexResult], locator: &'a Locator<'a>) -> Self {
pub(crate) fn from_tokens(tokens: &Tokens, locator: &'a Locator<'a>) -> Self {
assert!(u32::try_from(tokens.len()).is_ok());
let mut builder = LogicalLinesBuilder::with_capacity(tokens.len());
let mut parens = 0u32;
for (token, range) in tokens.iter().flatten() {
let token_kind = TokenKind::from_token(token);
builder.push_token(token_kind, *range);
for token in tokens.up_to_first_unknown() {
builder.push_token(token.kind(), token.range());
match token_kind {
match token.kind() {
TokenKind::Lbrace | TokenKind::Lpar | TokenKind::Lsqb => {
parens = parens.saturating_add(1);
}
@ -506,9 +504,7 @@ struct Line {
#[cfg(test)]
mod tests {
use ruff_python_parser::lexer::LexResult;
use ruff_python_parser::{lexer, Mode};
use ruff_python_parser::parse_module;
use ruff_source_file::Locator;
use super::LogicalLines;
@ -592,9 +588,9 @@ if False:
}
fn assert_logical_lines(contents: &str, expected: &[&str]) {
let lxr: Vec<LexResult> = lexer::lex(contents, Mode::Module).collect();
let parsed = parse_module(contents).unwrap();
let locator = Locator::new(contents);
let actual: Vec<String> = LogicalLines::from_tokens(&lxr, &locator)
let actual: Vec<String> = LogicalLines::from_tokens(parsed.tokens(), &locator)
.into_iter()
.map(|line| line.text_trimmed())
.map(ToString::to_string)

View file

@ -104,7 +104,7 @@ pub(crate) fn not_tests(checker: &mut Checker, unary_op: &ast::ExprUnaryOp) {
&[CmpOp::NotIn],
comparators,
unary_op.into(),
checker.indexer().comment_ranges(),
checker.parsed().comment_ranges(),
checker.locator(),
),
unary_op.range(),
@ -125,7 +125,7 @@ pub(crate) fn not_tests(checker: &mut Checker, unary_op: &ast::ExprUnaryOp) {
&[CmpOp::IsNot],
comparators,
unary_op.into(),
checker.indexer().comment_ranges(),
checker.parsed().comment_ranges(),
checker.locator(),
),
unary_op.range(),

View file

@ -1,7 +1,7 @@
use ruff_diagnostics::{AlwaysFixableViolation, Diagnostic, Edit, Fix};
use ruff_macros::{derive_message_formats, violation};
use ruff_python_parser::{TokenKind, TokenKindIter};
use ruff_text_size::{TextRange, TextSize};
use ruff_python_parser::{TokenKind, Tokens};
use ruff_text_size::{Ranged, TextRange, TextSize};
/// ## What it does
/// Checks for files with multiple trailing blank lines.
@ -54,22 +54,19 @@ impl AlwaysFixableViolation for TooManyNewlinesAtEndOfFile {
}
/// W391
pub(crate) fn too_many_newlines_at_end_of_file(
diagnostics: &mut Vec<Diagnostic>,
tokens: TokenKindIter,
) {
pub(crate) fn too_many_newlines_at_end_of_file(diagnostics: &mut Vec<Diagnostic>, tokens: &Tokens) {
let mut num_trailing_newlines = 0u32;
let mut start: Option<TextSize> = None;
let mut end: Option<TextSize> = None;
// Count the number of trailing newlines.
for (token, range) in tokens.rev() {
match token {
for token in tokens.up_to_first_unknown().iter().rev() {
match token.kind() {
TokenKind::NonLogicalNewline | TokenKind::Newline => {
if num_trailing_newlines == 0 {
end = Some(range.end());
end = Some(token.end());
}
start = Some(range.end());
start = Some(token.end());
num_trailing_newlines += 1;
}
TokenKind::Dedent => continue,

View file

@ -17,12 +17,12 @@ mod tests {
use ruff_python_ast::PySourceType;
use ruff_python_codegen::Stylist;
use ruff_python_index::Indexer;
use ruff_python_parser::AsMode;
use ruff_python_trivia::textwrap::dedent;
use ruff_source_file::Locator;
use ruff_text_size::Ranged;
use crate::linter::{check_path, LinterResult, TokenSource};
use crate::linter::{check_path, LinterResult};
use crate::registry::{AsRule, Linter, Rule};
use crate::rules::pyflakes;
use crate::settings::types::PreviewMode;
@ -638,12 +638,13 @@ mod tests {
let source_type = PySourceType::default();
let source_kind = SourceKind::Python(contents.to_string());
let settings = LinterSettings::for_rules(Linter::Pyflakes.rules());
let tokens = ruff_python_parser::tokenize(&contents, source_type.as_mode());
let parsed =
ruff_python_parser::parse_unchecked_source(source_kind.source_code(), source_type);
let locator = Locator::new(&contents);
let stylist = Stylist::from_tokens(&tokens, &locator);
let indexer = Indexer::from_tokens(&tokens, &locator);
let stylist = Stylist::from_tokens(parsed.tokens(), &locator);
let indexer = Indexer::from_tokens(parsed.tokens(), &locator);
let directives = directives::extract_directives(
&tokens,
&parsed,
directives::Flags::from_settings(&settings),
&locator,
&indexer,
@ -662,7 +663,7 @@ mod tests {
flags::Noqa::Enabled,
&source_kind,
source_type,
TokenSource::Tokens(tokens),
&parsed,
);
diagnostics.sort_by_key(Ranged::start);
let actual = diagnostics

View file

@ -4,8 +4,8 @@ use ruff_python_ast::{CmpOp, Expr};
use ruff_diagnostics::{AlwaysFixableViolation, Diagnostic, Edit, Fix};
use ruff_macros::{derive_message_formats, violation};
use ruff_python_ast::helpers;
use ruff_python_parser::{lexer, Mode, Tok};
use ruff_text_size::{Ranged, TextRange, TextSize};
use ruff_python_parser::{TokenKind, Tokens};
use ruff_text_size::{Ranged, TextRange};
use crate::checkers::ast::Checker;
@ -96,7 +96,7 @@ pub(crate) fn invalid_literal_comparison(
{
let mut diagnostic = Diagnostic::new(IsLiteral { cmp_op: op.into() }, expr.range());
if lazy_located.is_none() {
lazy_located = Some(locate_cmp_ops(expr, checker.locator().contents()));
lazy_located = Some(locate_cmp_ops(expr, checker.parsed().tokens()));
}
if let Some(located_op) = lazy_located.as_ref().and_then(|located| located.get(index)) {
assert_eq!(located_op.op, *op);
@ -110,7 +110,7 @@ pub(crate) fn invalid_literal_comparison(
} {
diagnostic.set_fix(Fix::safe_edit(Edit::range_replacement(
content,
located_op.range + expr.start(),
located_op.range,
)));
}
} else {
@ -138,102 +138,83 @@ impl From<&CmpOp> for IsCmpOp {
}
}
/// Extract all [`CmpOp`] operators from an expression snippet, with appropriate
/// ranges.
/// Extract all [`CmpOp`] operators from an expression snippet, with appropriate ranges.
///
/// `RustPython` doesn't include line and column information on [`CmpOp`] nodes.
/// `CPython` doesn't either. This method iterates over the token stream and
/// re-identifies [`CmpOp`] nodes, annotating them with valid ranges.
fn locate_cmp_ops(expr: &Expr, source: &str) -> Vec<LocatedCmpOp> {
// If `Expr` is a multi-line expression, we need to parenthesize it to
// ensure that it's lexed correctly.
let contents = &source[expr.range()];
let parenthesized_contents = format!("({contents})");
let mut tok_iter = lexer::lex(&parenthesized_contents, Mode::Expression)
.flatten()
.skip(1)
.map(|(tok, range)| (tok, range - TextSize::from(1)))
.filter(|(tok, _)| !matches!(tok, Tok::NonLogicalNewline | Tok::Comment(_)))
/// This method iterates over the token stream and re-identifies [`CmpOp`] nodes, annotating them
/// with valid ranges.
fn locate_cmp_ops(expr: &Expr, tokens: &Tokens) -> Vec<LocatedCmpOp> {
let mut tok_iter = tokens
.in_range(expr.range())
.iter()
.filter(|token| !token.is_trivia())
.peekable();
let mut ops: Vec<LocatedCmpOp> = vec![];
// Track the bracket depth.
let mut par_count = 0u32;
let mut sqb_count = 0u32;
let mut brace_count = 0u32;
// Track the nesting level.
let mut nesting = 0u32;
loop {
let Some((tok, range)) = tok_iter.next() else {
let Some(token) = tok_iter.next() else {
break;
};
match tok {
Tok::Lpar => {
par_count = par_count.saturating_add(1);
match token.kind() {
TokenKind::Lpar | TokenKind::Lsqb | TokenKind::Lbrace => {
nesting = nesting.saturating_add(1);
}
Tok::Rpar => {
par_count = par_count.saturating_sub(1);
}
Tok::Lsqb => {
sqb_count = sqb_count.saturating_add(1);
}
Tok::Rsqb => {
sqb_count = sqb_count.saturating_sub(1);
}
Tok::Lbrace => {
brace_count = brace_count.saturating_add(1);
}
Tok::Rbrace => {
brace_count = brace_count.saturating_sub(1);
TokenKind::Rpar | TokenKind::Rsqb | TokenKind::Rbrace => {
nesting = nesting.saturating_sub(1);
}
_ => {}
}
if par_count > 0 || sqb_count > 0 || brace_count > 0 {
if nesting > 0 {
continue;
}
match tok {
Tok::Not => {
if let Some((_, next_range)) = tok_iter.next_if(|(tok, _)| tok.is_in()) {
match token.kind() {
TokenKind::Not => {
if let Some(next_token) = tok_iter.next_if(|token| token.kind() == TokenKind::In) {
ops.push(LocatedCmpOp::new(
TextRange::new(range.start(), next_range.end()),
TextRange::new(token.start(), next_token.end()),
CmpOp::NotIn,
));
}
}
Tok::In => {
ops.push(LocatedCmpOp::new(range, CmpOp::In));
TokenKind::In => {
ops.push(LocatedCmpOp::new(token.range(), CmpOp::In));
}
Tok::Is => {
let op = if let Some((_, next_range)) = tok_iter.next_if(|(tok, _)| tok.is_not()) {
TokenKind::Is => {
let op = if let Some(next_token) =
tok_iter.next_if(|token| token.kind() == TokenKind::Not)
{
LocatedCmpOp::new(
TextRange::new(range.start(), next_range.end()),
TextRange::new(token.start(), next_token.end()),
CmpOp::IsNot,
)
} else {
LocatedCmpOp::new(range, CmpOp::Is)
LocatedCmpOp::new(token.range(), CmpOp::Is)
};
ops.push(op);
}
Tok::NotEqual => {
ops.push(LocatedCmpOp::new(range, CmpOp::NotEq));
TokenKind::NotEqual => {
ops.push(LocatedCmpOp::new(token.range(), CmpOp::NotEq));
}
Tok::EqEqual => {
ops.push(LocatedCmpOp::new(range, CmpOp::Eq));
TokenKind::EqEqual => {
ops.push(LocatedCmpOp::new(token.range(), CmpOp::Eq));
}
Tok::GreaterEqual => {
ops.push(LocatedCmpOp::new(range, CmpOp::GtE));
TokenKind::GreaterEqual => {
ops.push(LocatedCmpOp::new(token.range(), CmpOp::GtE));
}
Tok::Greater => {
ops.push(LocatedCmpOp::new(range, CmpOp::Gt));
TokenKind::Greater => {
ops.push(LocatedCmpOp::new(token.range(), CmpOp::Gt));
}
Tok::LessEqual => {
ops.push(LocatedCmpOp::new(range, CmpOp::LtE));
TokenKind::LessEqual => {
ops.push(LocatedCmpOp::new(token.range(), CmpOp::LtE));
}
Tok::Less => {
ops.push(LocatedCmpOp::new(range, CmpOp::Lt));
TokenKind::Less => {
ops.push(LocatedCmpOp::new(token.range(), CmpOp::Lt));
}
_ => {}
}
@ -266,12 +247,16 @@ mod tests {
use super::{locate_cmp_ops, LocatedCmpOp};
fn extract_cmp_op_locations(source: &str) -> Result<Vec<LocatedCmpOp>> {
let parsed = parse_expression(source)?;
Ok(locate_cmp_ops(parsed.expr(), parsed.tokens()))
}
#[test]
fn extract_cmp_op_location() -> Result<()> {
fn test_locate_cmp_ops() -> Result<()> {
let contents = "x == 1";
let expr = parse_expression(contents)?;
assert_eq!(
locate_cmp_ops(&expr, contents),
extract_cmp_op_locations(contents)?,
vec![LocatedCmpOp::new(
TextSize::from(2)..TextSize::from(4),
CmpOp::Eq
@ -279,9 +264,8 @@ mod tests {
);
let contents = "x != 1";
let expr = parse_expression(contents)?;
assert_eq!(
locate_cmp_ops(&expr, contents),
extract_cmp_op_locations(contents)?,
vec![LocatedCmpOp::new(
TextSize::from(2)..TextSize::from(4),
CmpOp::NotEq
@ -289,9 +273,8 @@ mod tests {
);
let contents = "x is 1";
let expr = parse_expression(contents)?;
assert_eq!(
locate_cmp_ops(&expr, contents),
extract_cmp_op_locations(contents)?,
vec![LocatedCmpOp::new(
TextSize::from(2)..TextSize::from(4),
CmpOp::Is
@ -299,9 +282,8 @@ mod tests {
);
let contents = "x is not 1";
let expr = parse_expression(contents)?;
assert_eq!(
locate_cmp_ops(&expr, contents),
extract_cmp_op_locations(contents)?,
vec![LocatedCmpOp::new(
TextSize::from(2)..TextSize::from(8),
CmpOp::IsNot
@ -309,9 +291,8 @@ mod tests {
);
let contents = "x in 1";
let expr = parse_expression(contents)?;
assert_eq!(
locate_cmp_ops(&expr, contents),
extract_cmp_op_locations(contents)?,
vec![LocatedCmpOp::new(
TextSize::from(2)..TextSize::from(4),
CmpOp::In
@ -319,9 +300,8 @@ mod tests {
);
let contents = "x not in 1";
let expr = parse_expression(contents)?;
assert_eq!(
locate_cmp_ops(&expr, contents),
extract_cmp_op_locations(contents)?,
vec![LocatedCmpOp::new(
TextSize::from(2)..TextSize::from(8),
CmpOp::NotIn
@ -329,9 +309,8 @@ mod tests {
);
let contents = "x != (1 is not 2)";
let expr = parse_expression(contents)?;
assert_eq!(
locate_cmp_ops(&expr, contents),
extract_cmp_op_locations(contents)?,
vec![LocatedCmpOp::new(
TextSize::from(2)..TextSize::from(4),
CmpOp::NotEq

View file

@ -169,7 +169,7 @@ pub(crate) fn repeated_keys(checker: &mut Checker, dict: &ast::ExprDict) {
parenthesized_range(
dict.value(i - 1).into(),
dict.into(),
checker.indexer().comment_ranges(),
checker.parsed().comment_ranges(),
checker.locator().contents(),
)
.unwrap_or_else(|| dict.value(i - 1).range())
@ -177,7 +177,7 @@ pub(crate) fn repeated_keys(checker: &mut Checker, dict: &ast::ExprDict) {
parenthesized_range(
dict.value(i).into(),
dict.into(),
checker.indexer().comment_ranges(),
checker.parsed().comment_ranges(),
checker.locator().contents(),
)
.unwrap_or_else(|| dict.value(i).range())
@ -201,7 +201,7 @@ pub(crate) fn repeated_keys(checker: &mut Checker, dict: &ast::ExprDict) {
parenthesized_range(
dict.value(i - 1).into(),
dict.into(),
checker.indexer().comment_ranges(),
checker.parsed().comment_ranges(),
checker.locator().contents(),
)
.unwrap_or_else(|| dict.value(i - 1).range())
@ -209,7 +209,7 @@ pub(crate) fn repeated_keys(checker: &mut Checker, dict: &ast::ExprDict) {
parenthesized_range(
dict.value(i).into(),
dict.into(),
checker.indexer().comment_ranges(),
checker.parsed().comment_ranges(),
checker.locator().contents(),
)
.unwrap_or_else(|| dict.value(i).range())

View file

@ -4,10 +4,9 @@ use ruff_diagnostics::{Diagnostic, Edit, Fix, FixAvailability, Violation};
use ruff_macros::{derive_message_formats, violation};
use ruff_python_ast::helpers::contains_effect;
use ruff_python_ast::parenthesize::parenthesized_range;
use ruff_python_ast::{self as ast, PySourceType, Stmt};
use ruff_python_parser::{lexer, AsMode, Tok};
use ruff_python_ast::{self as ast, Stmt};
use ruff_python_parser::{TokenKind, Tokens};
use ruff_python_semantic::{Binding, Scope};
use ruff_source_file::Locator;
use ruff_text_size::{Ranged, TextRange, TextSize};
use crate::checkers::ast::Checker;
@ -65,22 +64,13 @@ impl Violation for UnusedVariable {
}
/// Return the [`TextRange`] of the token before the next match of the predicate
fn match_token_before<F>(
location: TextSize,
locator: &Locator,
source_type: PySourceType,
f: F,
) -> Option<TextRange>
fn match_token_before<F>(tokens: &Tokens, location: TextSize, f: F) -> Option<TextRange>
where
F: Fn(Tok) -> bool,
F: Fn(TokenKind) -> bool,
{
let contents = locator.after(location);
for ((_, range), (tok, _)) in lexer::lex_starts_at(contents, source_type.as_mode(), location)
.flatten()
.tuple_windows()
{
if f(tok) {
return Some(range);
for (prev, current) in tokens.after(location).iter().tuple_windows() {
if f(current.kind()) {
return Some(prev.range());
}
}
None
@ -88,55 +78,31 @@ where
/// Return the [`TextRange`] of the token after the next match of the predicate, skipping over
/// any bracketed expressions.
fn match_token_after<F>(
location: TextSize,
locator: &Locator,
source_type: PySourceType,
f: F,
) -> Option<TextRange>
fn match_token_after<F>(tokens: &Tokens, location: TextSize, f: F) -> Option<TextRange>
where
F: Fn(Tok) -> bool,
F: Fn(TokenKind) -> bool,
{
let contents = locator.after(location);
// Track the bracket depth.
let mut par_count = 0u32;
let mut sqb_count = 0u32;
let mut brace_count = 0u32;
let mut nesting = 0u32;
for ((tok, _), (_, range)) in lexer::lex_starts_at(contents, source_type.as_mode(), location)
.flatten()
.tuple_windows()
{
match tok {
Tok::Lpar => {
par_count = par_count.saturating_add(1);
for (current, next) in tokens.after(location).iter().tuple_windows() {
match current.kind() {
TokenKind::Lpar | TokenKind::Lsqb | TokenKind::Lbrace => {
nesting = nesting.saturating_add(1);
}
Tok::Lsqb => {
sqb_count = sqb_count.saturating_add(1);
}
Tok::Lbrace => {
brace_count = brace_count.saturating_add(1);
}
Tok::Rpar => {
par_count = par_count.saturating_sub(1);
}
Tok::Rsqb => {
sqb_count = sqb_count.saturating_sub(1);
}
Tok::Rbrace => {
brace_count = brace_count.saturating_sub(1);
TokenKind::Rpar | TokenKind::Rsqb | TokenKind::Rbrace => {
nesting = nesting.saturating_sub(1);
}
_ => {}
}
// If we're in nested brackets, continue.
if par_count > 0 || sqb_count > 0 || brace_count > 0 {
if nesting > 0 {
continue;
}
if f(tok) {
return Some(range);
if f(current.kind()) {
return Some(next.range());
}
}
None
@ -144,61 +110,34 @@ where
/// Return the [`TextRange`] of the token matching the predicate or the first mismatched
/// bracket, skipping over any bracketed expressions.
fn match_token_or_closing_brace<F>(
location: TextSize,
locator: &Locator,
source_type: PySourceType,
f: F,
) -> Option<TextRange>
fn match_token_or_closing_brace<F>(tokens: &Tokens, location: TextSize, f: F) -> Option<TextRange>
where
F: Fn(Tok) -> bool,
F: Fn(TokenKind) -> bool,
{
let contents = locator.after(location);
// Track the nesting level.
let mut nesting = 0u32;
// Track the bracket depth.
let mut par_count = 0u32;
let mut sqb_count = 0u32;
let mut brace_count = 0u32;
for (tok, range) in lexer::lex_starts_at(contents, source_type.as_mode(), location).flatten() {
match tok {
Tok::Lpar => {
par_count = par_count.saturating_add(1);
for token in tokens.after(location) {
match token.kind() {
TokenKind::Lpar | TokenKind::Lsqb | TokenKind::Lbrace => {
nesting = nesting.saturating_add(1);
}
Tok::Lsqb => {
sqb_count = sqb_count.saturating_add(1);
TokenKind::Rpar | TokenKind::Rsqb | TokenKind::Rbrace => {
if nesting == 0 {
return Some(token.range());
}
Tok::Lbrace => {
brace_count = brace_count.saturating_add(1);
}
Tok::Rpar => {
if par_count == 0 {
return Some(range);
}
par_count = par_count.saturating_sub(1);
}
Tok::Rsqb => {
if sqb_count == 0 {
return Some(range);
}
sqb_count = sqb_count.saturating_sub(1);
}
Tok::Rbrace => {
if brace_count == 0 {
return Some(range);
}
brace_count = brace_count.saturating_sub(1);
nesting = nesting.saturating_sub(1);
}
_ => {}
}
// If we're in nested brackets, continue.
if par_count > 0 || sqb_count > 0 || brace_count > 0 {
if nesting > 0 {
continue;
}
if f(tok) {
return Some(range);
if f(token.kind()) {
return Some(token.range());
}
}
None
@ -226,17 +165,15 @@ fn remove_unused_variable(binding: &Binding, checker: &Checker) -> Option<Fix> {
let start = parenthesized_range(
target.into(),
statement.into(),
checker.indexer().comment_ranges(),
checker.parsed().comment_ranges(),
checker.locator().contents(),
)
.unwrap_or(target.range())
.start();
let end = match_token_after(
target.end(),
checker.locator(),
checker.source_type,
|tok| tok == Tok::Equal,
)?
let end =
match_token_after(checker.parsed().tokens(), target.end(), |token| {
token == TokenKind::Equal
})?
.start();
let edit = Edit::deletion(start, end);
Some(Fix::unsafe_edit(edit))
@ -269,9 +206,8 @@ fn remove_unused_variable(binding: &Binding, checker: &Checker) -> Option<Fix> {
// If the expression is complex (`x = foo()`), remove the assignment,
// but preserve the right-hand side.
let start = statement.start();
let end =
match_token_after(start, checker.locator(), checker.source_type, |tok| {
tok == Tok::Equal
let end = match_token_after(checker.parsed().tokens(), start, |token| {
token == TokenKind::Equal
})?
.start();
let edit = Edit::deletion(start, end);
@ -293,20 +229,17 @@ fn remove_unused_variable(binding: &Binding, checker: &Checker) -> Option<Fix> {
if optional_vars.range() == binding.range() {
// Find the first token before the `as` keyword.
let start = match_token_before(
checker.parsed().tokens(),
item.context_expr.start(),
checker.locator(),
checker.source_type,
|tok| tok == Tok::As,
|token| token == TokenKind::As,
)?
.end();
// Find the first colon, comma, or closing bracket after the `as` keyword.
let end = match_token_or_closing_brace(
start,
checker.locator(),
checker.source_type,
|tok| tok == Tok::Colon || tok == Tok::Comma,
)?
let end =
match_token_or_closing_brace(checker.parsed().tokens(), start, |token| {
token == TokenKind::Colon || token == TokenKind::Comma
})?
.start();
let edit = Edit::deletion(start, end);

View file

@ -5,7 +5,7 @@ use regex::Regex;
use ruff_diagnostics::{Diagnostic, Violation};
use ruff_macros::{derive_message_formats, violation};
use ruff_python_index::Indexer;
use ruff_python_trivia::CommentRanges;
use ruff_source_file::Locator;
use ruff_text_size::TextSize;
@ -51,10 +51,10 @@ impl Violation for BlanketTypeIgnore {
/// PGH003
pub(crate) fn blanket_type_ignore(
diagnostics: &mut Vec<Diagnostic>,
indexer: &Indexer,
comment_ranges: &CommentRanges,
locator: &Locator,
) {
for range in indexer.comment_ranges() {
for range in comment_ranges {
let line = locator.slice(*range);
// Match, e.g., `# type: ignore` or `# type: ignore[attr-defined]`.

View file

@ -1,7 +1,6 @@
use ruff_diagnostics::{Diagnostic, Edit, Fix, FixAvailability, Violation};
use ruff_macros::{derive_message_formats, violation};
use ruff_python_index::Indexer;
use ruff_python_trivia::is_python_whitespace;
use ruff_python_trivia::{is_python_whitespace, CommentRanges};
use ruff_source_file::Locator;
use ruff_text_size::{TextRange, TextSize};
@ -45,12 +44,12 @@ impl Violation for EmptyComment {
/// PLR2044
pub(crate) fn empty_comments(
diagnostics: &mut Vec<Diagnostic>,
indexer: &Indexer,
comment_ranges: &CommentRanges,
locator: &Locator,
) {
let block_comments = indexer.comment_ranges().block_comments(locator);
let block_comments = comment_ranges.block_comments(locator);
for range in indexer.comment_ranges() {
for range in comment_ranges {
// Ignore comments that are part of multi-line "comment blocks".
if block_comments.binary_search(&range.start()).is_ok() {
continue;

View file

@ -160,7 +160,7 @@ pub(crate) fn if_stmt_min_max(checker: &mut Checker, stmt_if: &ast::StmtIf) {
parenthesized_range(
body_target.into(),
body.into(),
checker.indexer().comment_ranges(),
checker.parsed().comment_ranges(),
checker.locator().contents()
)
.unwrap_or(body_target.range())

View file

@ -156,7 +156,7 @@ pub(crate) fn nested_min_max(
}) {
let mut diagnostic = Diagnostic::new(NestedMinMax { func: min_max }, expr.range());
if !checker
.indexer()
.parsed()
.comment_ranges()
.has_comments(expr, checker.locator())
{

View file

@ -76,7 +76,7 @@ pub(crate) fn subprocess_run_without_check(checker: &mut Checker, call: &ast::Ex
add_argument(
"check=False",
&call.arguments,
checker.indexer().comment_ranges(),
checker.parsed().comment_ranges(),
checker.locator().contents(),
),
// If the function call contains `**kwargs`, mark the fix as unsafe.

View file

@ -254,13 +254,13 @@ pub(crate) fn too_many_branches(
#[cfg(test)]
mod tests {
use anyhow::Result;
use ruff_python_parser::parse_suite;
use ruff_python_parser::parse_module;
use super::num_branches;
fn test_helper(source: &str, expected_num_branches: usize) -> Result<()> {
let branches = parse_suite(source)?;
assert_eq!(num_branches(&branches), expected_num_branches);
let parsed = parse_module(source)?;
assert_eq!(num_branches(parsed.suite()), expected_num_branches);
Ok(())
}

View file

@ -98,13 +98,13 @@ pub(crate) fn too_many_return_statements(
#[cfg(test)]
mod tests {
use anyhow::Result;
use ruff_python_parser::parse_suite;
use ruff_python_parser::parse_module;
use super::num_returns;
fn test_helper(source: &str, expected: usize) -> Result<()> {
let stmts = parse_suite(source)?;
assert_eq!(num_returns(&stmts), expected);
let parsed = parse_module(source)?;
assert_eq!(num_returns(parsed.suite()), expected);
Ok(())
}

View file

@ -158,10 +158,16 @@ pub(crate) fn too_many_statements(
#[cfg(test)]
mod tests {
use anyhow::Result;
use ruff_python_parser::parse_suite;
use ruff_python_ast::Suite;
use ruff_python_parser::parse_module;
use super::num_statements;
fn parse_suite(source: &str) -> Result<Suite> {
Ok(parse_module(source)?.into_suite())
}
#[test]
fn pass() -> Result<()> {
let source: &str = r"

View file

@ -175,7 +175,7 @@ fn generate_keyword_fix(checker: &Checker, call: &ast::ExprCall) -> Fix {
}))
),
&call.arguments,
checker.indexer().comment_ranges(),
checker.parsed().comment_ranges(),
checker.locator().contents(),
))
}
@ -190,7 +190,7 @@ fn generate_import_fix(checker: &Checker, call: &ast::ExprCall) -> Result<Fix> {
let argument_edit = add_argument(
&format!("encoding={binding}(False)"),
&call.arguments,
checker.indexer().comment_ranges(),
checker.parsed().comment_ranges(),
checker.locator().contents(),
);
Ok(Fix::unsafe_edits(import_edit, [argument_edit]))

View file

@ -1,52 +1,49 @@
use ruff_python_parser::{lexer, Mode, Tok};
use ruff_python_ast::StmtImportFrom;
use ruff_python_parser::{TokenKind, Tokens};
use ruff_source_file::Locator;
use ruff_text_size::{TextRange, TextSize};
use ruff_text_size::{Ranged, TextRange};
/// Remove any imports matching `members` from an import-from statement.
pub(crate) fn remove_import_members(contents: &str, members: &[&str]) -> String {
let mut names: Vec<TextRange> = vec![];
let mut commas: Vec<TextRange> = vec![];
let mut removal_indices: Vec<usize> = vec![];
// Find all Tok::Name tokens that are not preceded by Tok::As, and all
// Tok::Comma tokens.
let mut prev_tok = None;
for (tok, range) in lexer::lex(contents, Mode::Module)
.flatten()
.skip_while(|(tok, _)| !matches!(tok, Tok::Import))
{
if let Tok::Name { name } = &tok {
if matches!(prev_tok, Some(Tok::As)) {
// Adjust the location to take the alias into account.
let last_range = names.last_mut().unwrap();
*last_range = TextRange::new(last_range.start(), range.end());
pub(crate) fn remove_import_members(
locator: &Locator<'_>,
import_from_stmt: &StmtImportFrom,
tokens: &Tokens,
members_to_remove: &[&str],
) -> String {
let commas: Vec<TextRange> = tokens
.in_range(import_from_stmt.range())
.iter()
.skip_while(|token| token.kind() != TokenKind::Import)
.filter_map(|token| {
if token.kind() == TokenKind::Comma {
Some(token.range())
} else {
if members.contains(&&**name) {
removal_indices.push(names.len());
}
names.push(range);
}
} else if matches!(tok, Tok::Comma) {
commas.push(range);
}
prev_tok = Some(tok);
None
}
})
.collect();
// Reconstruct the source code by skipping any names that are in `members`.
let locator = Locator::new(contents);
let mut output = String::with_capacity(contents.len());
let mut last_pos = TextSize::default();
let mut output = String::with_capacity(import_from_stmt.range().len().to_usize());
let mut last_pos = import_from_stmt.start();
let mut is_first = true;
for index in 0..names.len() {
if !removal_indices.contains(&index) {
for (index, member) in import_from_stmt.names.iter().enumerate() {
if !members_to_remove.contains(&member.name.as_str()) {
is_first = false;
continue;
}
let range = if is_first {
TextRange::new(names[index].start(), names[index + 1].start())
TextRange::new(
import_from_stmt.names[index].start(),
import_from_stmt.names[index + 1].start(),
)
} else {
TextRange::new(commas[index - 1].start(), names[index].end())
TextRange::new(
commas[index - 1].start(),
import_from_stmt.names[index].end(),
)
};
// Add all contents from `last_pos` to `fix.location`.
@ -61,20 +58,39 @@ pub(crate) fn remove_import_members(contents: &str, members: &[&str]) -> String
}
// Add the remaining content.
let slice = locator.after(last_pos);
let slice = locator.slice(TextRange::new(last_pos, import_from_stmt.end()));
output.push_str(slice);
output
}
#[cfg(test)]
mod tests {
use crate::rules::pyupgrade::fixes::remove_import_members;
use ruff_python_parser::parse_module;
use ruff_source_file::Locator;
use super::remove_import_members;
fn test_helper(source: &str, members_to_remove: &[&str]) -> String {
let parsed = parse_module(source).unwrap();
let import_from_stmt = parsed
.suite()
.first()
.expect("source should have one statement")
.as_import_from_stmt()
.expect("first statement should be an import from statement");
remove_import_members(
&Locator::new(source),
import_from_stmt,
parsed.tokens(),
members_to_remove,
)
}
#[test]
fn once() {
let source = r"from foo import bar, baz, bop, qux as q";
let expected = r"from foo import bar, baz, qux as q";
let actual = remove_import_members(source, &["bop"]);
let actual = test_helper(source, &["bop"]);
assert_eq!(expected, actual);
}
@ -82,7 +98,7 @@ mod tests {
fn twice() {
let source = r"from foo import bar, baz, bop, qux as q";
let expected = r"from foo import bar, qux as q";
let actual = remove_import_members(source, &["baz", "bop"]);
let actual = test_helper(source, &["baz", "bop"]);
assert_eq!(expected, actual);
}
@ -90,7 +106,7 @@ mod tests {
fn aliased() {
let source = r"from foo import bar, baz, bop as boop, qux as q";
let expected = r"from foo import bar, baz, qux as q";
let actual = remove_import_members(source, &["bop"]);
let actual = test_helper(source, &["bop"]);
assert_eq!(expected, actual);
}
@ -98,7 +114,7 @@ mod tests {
fn parenthesized() {
let source = r"from foo import (bar, baz, bop, qux as q)";
let expected = r"from foo import (bar, baz, qux as q)";
let actual = remove_import_members(source, &["bop"]);
let actual = test_helper(source, &["bop"]);
assert_eq!(expected, actual);
}
@ -106,7 +122,7 @@ mod tests {
fn last_import() {
let source = r"from foo import bar, baz, bop, qux as q";
let expected = r"from foo import bar, baz, bop";
let actual = remove_import_members(source, &["qux"]);
let actual = test_helper(source, &["qux"]);
assert_eq!(expected, actual);
}
@ -114,7 +130,7 @@ mod tests {
fn first_import() {
let source = r"from foo import bar, baz, bop, qux as q";
let expected = r"from foo import baz, bop, qux as q";
let actual = remove_import_members(source, &["bar"]);
let actual = test_helper(source, &["bar"]);
assert_eq!(expected, actual);
}
@ -122,7 +138,7 @@ mod tests {
fn first_two_imports() {
let source = r"from foo import bar, baz, bop, qux as q";
let expected = r"from foo import bop, qux as q";
let actual = remove_import_members(source, &["bar", "baz"]);
let actual = test_helper(source, &["bar", "baz"]);
assert_eq!(expected, actual);
}
@ -138,7 +154,7 @@ mod tests {
bop,
qux as q
)";
let actual = remove_import_members(source, &["bar", "baz"]);
let actual = test_helper(source, &["bar", "baz"]);
assert_eq!(expected, actual);
}
@ -155,7 +171,7 @@ mod tests {
baz,
qux as q,
)";
let actual = remove_import_members(source, &["bop"]);
let actual = test_helper(source, &["bop"]);
assert_eq!(expected, actual);
}
@ -171,7 +187,7 @@ mod tests {
bar,
qux as q,
)";
let actual = remove_import_members(source, &["baz", "bop"]);
let actual = test_helper(source, &["baz", "bop"]);
assert_eq!(expected, actual);
}
@ -191,7 +207,7 @@ mod tests {
# This comment should be retained.
qux as q,
)";
let actual = remove_import_members(source, &["bop"]);
let actual = test_helper(source, &["bop"]);
assert_eq!(expected, actual);
}
@ -211,7 +227,7 @@ mod tests {
bop,
qux as q,
)";
let actual = remove_import_members(source, &["bar"]);
let actual = test_helper(source, &["bar"]);
assert_eq!(expected, actual);
}
}

View file

@ -1,10 +1,11 @@
use itertools::Itertools;
use ruff_python_ast::{Alias, Stmt};
use ruff_python_ast::{Alias, StmtImportFrom};
use ruff_diagnostics::{Diagnostic, Edit, Fix, FixAvailability, Violation};
use ruff_macros::{derive_message_formats, violation};
use ruff_python_ast::whitespace::indentation;
use ruff_python_codegen::Stylist;
use ruff_python_parser::Tokens;
use ruff_source_file::Locator;
use ruff_text_size::Ranged;
@ -398,29 +399,29 @@ const TYPING_EXTENSIONS_TO_TYPES_313: &[&str] = &["CapsuleType"];
const TYPING_EXTENSIONS_TO_WARNINGS_313: &[&str] = &["deprecated"];
struct ImportReplacer<'a> {
stmt: &'a Stmt,
import_from_stmt: &'a StmtImportFrom,
module: &'a str,
members: &'a [Alias],
locator: &'a Locator<'a>,
stylist: &'a Stylist<'a>,
tokens: &'a Tokens,
version: PythonVersion,
}
impl<'a> ImportReplacer<'a> {
const fn new(
stmt: &'a Stmt,
import_from_stmt: &'a StmtImportFrom,
module: &'a str,
members: &'a [Alias],
locator: &'a Locator<'a>,
stylist: &'a Stylist<'a>,
tokens: &'a Tokens,
version: PythonVersion,
) -> Self {
Self {
stmt,
import_from_stmt,
module,
members,
locator,
stylist,
tokens,
version,
}
}
@ -430,7 +431,7 @@ impl<'a> ImportReplacer<'a> {
let mut operations = vec![];
if self.module == "typing" {
if self.version >= PythonVersion::Py39 {
for member in self.members {
for member in &self.import_from_stmt.names {
if let Some(target) = TYPING_TO_RENAME_PY39.iter().find_map(|(name, target)| {
if &member.name == *name {
Some(*target)
@ -616,7 +617,7 @@ impl<'a> ImportReplacer<'a> {
let fix = Some(matched);
Some((operation, fix))
} else {
let indentation = indentation(self.locator, self.stmt);
let indentation = indentation(self.locator, self.import_from_stmt);
// If we have matched _and_ unmatched names, but the import is not on its own
// line, we can't add a statement after it. For example, if we have
@ -636,7 +637,9 @@ impl<'a> ImportReplacer<'a> {
let matched = ImportReplacer::format_import_from(&matched_names, target);
let unmatched = fixes::remove_import_members(
self.locator.slice(self.stmt.range()),
self.locator,
self.import_from_stmt,
self.tokens,
&matched_names
.iter()
.map(|name| name.name.as_str())
@ -664,7 +667,7 @@ impl<'a> ImportReplacer<'a> {
fn partition_imports(&self, candidates: &[&str]) -> (Vec<&Alias>, Vec<&Alias>) {
let mut matched_names = vec![];
let mut unmatched_names = vec![];
for name in self.members {
for name in &self.import_from_stmt.names {
if candidates.contains(&name.name.as_str()) {
matched_names.push(name);
} else {
@ -691,21 +694,19 @@ impl<'a> ImportReplacer<'a> {
}
/// UP035
pub(crate) fn deprecated_import(
checker: &mut Checker,
stmt: &Stmt,
names: &[Alias],
module: Option<&str>,
level: u32,
) {
pub(crate) fn deprecated_import(checker: &mut Checker, import_from_stmt: &StmtImportFrom) {
// Avoid relative and star imports.
if level > 0 {
if import_from_stmt.level > 0 {
return;
}
if names.first().is_some_and(|name| &name.name == "*") {
if import_from_stmt
.names
.first()
.is_some_and(|name| &name.name == "*")
{
return;
}
let Some(module) = module else {
let Some(module) = import_from_stmt.module.as_deref() else {
return;
};
@ -713,13 +714,12 @@ pub(crate) fn deprecated_import(
return;
}
let members: Vec<Alias> = names.iter().map(Clone::clone).collect();
let fixer = ImportReplacer::new(
stmt,
import_from_stmt,
module,
&members,
checker.locator(),
checker.stylist(),
checker.parsed().tokens(),
checker.settings.target_version,
);
@ -728,12 +728,12 @@ pub(crate) fn deprecated_import(
DeprecatedImport {
deprecation: Deprecation::WithoutRename(operation),
},
stmt.range(),
import_from_stmt.range(),
);
if let Some(content) = fix {
diagnostic.set_fix(Fix::safe_edit(Edit::range_replacement(
content,
stmt.range(),
import_from_stmt.range(),
)));
}
checker.diagnostics.push(diagnostic);
@ -744,7 +744,7 @@ pub(crate) fn deprecated_import(
DeprecatedImport {
deprecation: Deprecation::WithRename(operation),
},
stmt.range(),
import_from_stmt.range(),
);
checker.diagnostics.push(diagnostic);
}

View file

@ -1,5 +1,7 @@
use ruff_python_parser::{TokenKind, TokenKindIter};
use ruff_text_size::TextRange;
use std::slice::Iter;
use ruff_python_parser::{Token, TokenKind, Tokens};
use ruff_text_size::{Ranged, TextRange};
use ruff_diagnostics::{AlwaysFixableViolation, Diagnostic, Edit, Fix};
use ruff_macros::{derive_message_formats, violation};
@ -36,17 +38,17 @@ impl AlwaysFixableViolation for ExtraneousParentheses {
}
// See: https://github.com/asottile/pyupgrade/blob/97ed6fb3cf2e650d4f762ba231c3f04c41797710/pyupgrade/_main.py#L148
fn match_extraneous_parentheses(tokens: &mut TokenKindIter) -> Option<(TextRange, TextRange)> {
fn match_extraneous_parentheses(tokens: &mut Iter<'_, Token>) -> Option<(TextRange, TextRange)> {
// Store the location of the extraneous opening parenthesis.
let start_range = loop {
let (token, range) = tokens.next()?;
let token = tokens.next()?;
match token {
match token.kind() {
TokenKind::Comment | TokenKind::NonLogicalNewline => {
continue;
}
TokenKind::Lpar => {
break range;
break token.range();
}
_ => {
return None;
@ -62,22 +64,28 @@ fn match_extraneous_parentheses(tokens: &mut TokenKindIter) -> Option<(TextRange
// Store the location of the extraneous closing parenthesis.
let end_range = loop {
let (token, range) = tokens.next()?;
let token = tokens.next()?;
match token.kind() {
// If we find a comma or a yield at depth 1 or 2, it's a tuple or coroutine.
if depth == 1 && matches!(token, TokenKind::Comma | TokenKind::Yield) {
return None;
} else if matches!(token, TokenKind::Lpar | TokenKind::Lbrace | TokenKind::Lsqb) {
TokenKind::Comma | TokenKind::Yield if depth == 1 => return None,
TokenKind::Lpar | TokenKind::Lbrace | TokenKind::Lsqb => {
depth = depth.saturating_add(1);
} else if matches!(token, TokenKind::Rpar | TokenKind::Rbrace | TokenKind::Rsqb) {
}
TokenKind::Rpar | TokenKind::Rbrace | TokenKind::Rsqb => {
depth = depth.saturating_sub(1);
}
_ => {}
}
if depth == 0 {
break range;
break token.range();
}
if !matches!(token, TokenKind::Comment | TokenKind::NonLogicalNewline) {
if !matches!(
token.kind(),
TokenKind::Comment | TokenKind::NonLogicalNewline
) {
empty_tuple = false;
}
};
@ -88,9 +96,9 @@ fn match_extraneous_parentheses(tokens: &mut TokenKindIter) -> Option<(TextRange
// Find the next non-coding token.
let token = loop {
let (token, _) = tokens.next()?;
let token = tokens.next()?;
match token {
match token.kind() {
TokenKind::Comment | TokenKind::NonLogicalNewline => continue,
_ => {
break token;
@ -98,7 +106,7 @@ fn match_extraneous_parentheses(tokens: &mut TokenKindIter) -> Option<(TextRange
}
};
if matches!(token, TokenKind::Rpar) {
if matches!(token.kind(), TokenKind::Rpar) {
Some((start_range, end_range))
} else {
None
@ -108,15 +116,16 @@ fn match_extraneous_parentheses(tokens: &mut TokenKindIter) -> Option<(TextRange
/// UP034
pub(crate) fn extraneous_parentheses(
diagnostics: &mut Vec<Diagnostic>,
mut tokens: TokenKindIter,
tokens: &Tokens,
locator: &Locator,
) {
while let Some((token, _)) = tokens.next() {
if !matches!(token, TokenKind::Lpar) {
let mut token_iter = tokens.up_to_first_unknown().iter();
while let Some(token) = token_iter.next() {
if !matches!(token.kind(), TokenKind::Lpar) {
continue;
}
let Some((start_range, end_range)) = match_extraneous_parentheses(&mut tokens) else {
let Some((start_range, end_range)) = match_extraneous_parentheses(&mut token_iter) else {
continue;
};

View file

@ -11,7 +11,7 @@ use ruff_python_ast::{self as ast, Expr, Keyword};
use ruff_python_literal::format::{
FieldName, FieldNamePart, FieldType, FormatPart, FormatString, FromTemplate,
};
use ruff_python_parser::{lexer, Mode, Tok};
use ruff_python_parser::TokenKind;
use ruff_source_file::Locator;
use ruff_text_size::{Ranged, TextRange};
@ -409,15 +409,13 @@ pub(crate) fn f_strings(checker: &mut Checker, call: &ast::ExprCall, summary: &F
};
let mut patches: Vec<(TextRange, FStringConversion)> = vec![];
let mut lex = lexer::lex_starts_at(
checker.locator().slice(call.func.range()),
Mode::Expression,
call.start(),
)
.flatten();
let mut tokens = checker.parsed().tokens().in_range(call.func.range()).iter();
let end = loop {
match lex.next() {
Some((Tok::Dot, range)) => {
let Some(token) = tokens.next() else {
unreachable!("Should break from the `Tok::Dot` arm");
};
match token.kind() {
TokenKind::Dot => {
// ```
// (
// "a"
@ -429,10 +427,11 @@ pub(crate) fn f_strings(checker: &mut Checker, call: &ast::ExprCall, summary: &F
//
// We know that the expression is a string literal, so we can safely assume that the
// dot is the start of an attribute access.
break range.start();
break token.start();
}
Some((Tok::String { .. }, range)) => {
match FStringConversion::try_convert(range, &mut summary, checker.locator()) {
TokenKind::String => {
match FStringConversion::try_convert(token.range(), &mut summary, checker.locator())
{
// If the format string contains side effects that would need to be repeated,
// we can't convert it to an f-string.
Ok(FStringConversion::SideEffects) => return,
@ -440,11 +439,10 @@ pub(crate) fn f_strings(checker: &mut Checker, call: &ast::ExprCall, summary: &F
// expression.
Err(_) => return,
// Otherwise, push the conversion to be processed later.
Ok(conversion) => patches.push((range, conversion)),
Ok(conversion) => patches.push((token.range(), conversion)),
}
}
Some(_) => continue,
None => unreachable!("Should break from the `Tok::Dot` arm"),
_ => {}
}
};
if patches.is_empty() {
@ -515,7 +513,7 @@ pub(crate) fn f_strings(checker: &mut Checker, call: &ast::ExprCall, summary: &F
// )
// ```
let has_comments = checker
.indexer()
.parsed()
.comment_ranges()
.intersects(call.arguments.range());

View file

@ -8,7 +8,7 @@ use ruff_python_codegen::Stylist;
use ruff_python_literal::cformat::{
CConversionFlags, CFormatPart, CFormatPrecision, CFormatQuantity, CFormatString,
};
use ruff_python_parser::{lexer, AsMode, Tok};
use ruff_python_parser::TokenKind;
use ruff_python_stdlib::identifiers::is_identifier;
use ruff_source_file::Locator;
use ruff_text_size::{Ranged, TextRange};
@ -344,38 +344,22 @@ fn convertible(format_string: &CFormatString, params: &Expr) -> bool {
}
/// UP031
pub(crate) fn printf_string_formatting(checker: &mut Checker, expr: &Expr, right: &Expr) {
// Grab each string segment (in case there's an implicit concatenation).
let mut strings: Vec<(TextRange, AnyStringFlags)> = vec![];
let mut extension = None;
for (tok, range) in lexer::lex_starts_at(
checker.locator().slice(expr),
checker.source_type.as_mode(),
expr.start(),
)
.flatten()
{
match tok {
Tok::String { flags, .. } => strings.push((range, flags)),
// If we hit a right paren, we have to preserve it.
Tok::Rpar => extension = Some(range),
// Break as soon as we find the modulo symbol.
Tok::Percent => break,
_ => continue,
}
}
pub(crate) fn printf_string_formatting(
checker: &mut Checker,
bin_op: &ast::ExprBinOp,
string_expr: &ast::ExprStringLiteral,
) {
let right = &*bin_op.right;
// If there are no string segments, abort.
if strings.is_empty() {
return;
}
// Parse each string segment.
let mut num_positional_arguments = 0;
let mut num_keyword_arguments = 0;
let mut format_strings = Vec::with_capacity(strings.len());
for (range, flags) in &strings {
let string = checker.locator().slice(*range);
let mut format_strings: Vec<(TextRange, String)> =
Vec::with_capacity(string_expr.value.as_slice().len());
// Parse each string segment.
for string_literal in &string_expr.value {
let string = checker.locator().slice(string_literal);
let flags = AnyStringFlags::from(string_literal.flags);
let string = &string
[usize::from(flags.opener_len())..(string.len() - usize::from(flags.closer_len()))];
@ -400,7 +384,10 @@ pub(crate) fn printf_string_formatting(checker: &mut Checker, expr: &Expr, right
}
// Convert the `%`-format string to a `.format` string.
format_strings.push(flags.format_string_contents(&percent_to_format(&format_string)));
format_strings.push((
string_literal.range(),
flags.format_string_contents(&percent_to_format(&format_string)),
));
}
// Parse the parameters.
@ -448,41 +435,55 @@ pub(crate) fn printf_string_formatting(checker: &mut Checker, expr: &Expr, right
// Reconstruct the string.
let mut contents = String::new();
let mut prev = None;
for ((range, _), format_string) in strings.iter().zip(format_strings) {
let mut prev_end = None;
for (range, format_string) in format_strings {
// Add the content before the string segment.
match prev {
match prev_end {
None => {
contents.push_str(
checker
.locator()
.slice(TextRange::new(expr.start(), range.start())),
.slice(TextRange::new(bin_op.start(), range.start())),
);
}
Some(prev) => {
contents.push_str(checker.locator().slice(TextRange::new(prev, range.start())));
Some(prev_end) => {
contents.push_str(
checker
.locator()
.slice(TextRange::new(prev_end, range.start())),
);
}
}
// Add the string itself.
contents.push_str(&format_string);
prev = Some(range.end());
prev_end = Some(range.end());
}
if let Some(range) = extension {
if let Some(prev_end) = prev_end {
for token in checker.parsed().tokens().after(prev_end) {
match token.kind() {
// If we hit a right paren, we have to preserve it.
TokenKind::Rpar => {
contents.push_str(
checker
.locator()
.slice(TextRange::new(prev.unwrap(), range.end())),
.slice(TextRange::new(prev_end, token.end())),
);
}
// Break as soon as we find the modulo symbol.
TokenKind::Percent => break,
_ => {}
}
}
}
// Add the `.format` call.
contents.push_str(&format!(".format{params_string}"));
let mut diagnostic = Diagnostic::new(PrintfStringFormatting, expr.range());
let mut diagnostic = Diagnostic::new(PrintfStringFormatting, bin_op.range());
diagnostic.set_fix(Fix::unsafe_edit(Edit::range_replacement(
contents,
expr.range(),
bin_op.range(),
)));
checker.diagnostics.push(diagnostic);
}

View file

@ -4,9 +4,8 @@ use anyhow::{anyhow, Result};
use ruff_diagnostics::{AlwaysFixableViolation, Diagnostic, Edit, Fix};
use ruff_macros::{derive_message_formats, violation};
use ruff_python_ast::{self as ast, Expr, PySourceType};
use ruff_python_parser::{lexer, AsMode};
use ruff_source_file::Locator;
use ruff_python_ast::{self as ast, Expr};
use ruff_python_parser::{TokenKind, Tokens};
use ruff_text_size::{Ranged, TextSize};
use crate::checkers::ast::Checker;
@ -76,12 +75,11 @@ pub(crate) fn redundant_open_modes(checker: &mut Checker, call: &ast::ExprCall)
}) = &keyword.value
{
if let Ok(mode) = OpenMode::from_str(mode_param_value.to_str()) {
checker.diagnostics.push(create_check(
checker.diagnostics.push(create_diagnostic(
call,
&keyword.value,
mode.replacement_value(),
checker.locator(),
checker.source_type,
checker.parsed().tokens(),
));
}
}
@ -91,12 +89,11 @@ pub(crate) fn redundant_open_modes(checker: &mut Checker, call: &ast::ExprCall)
Some(mode_param) => {
if let Expr::StringLiteral(ast::ExprStringLiteral { value, .. }) = &mode_param {
if let Ok(mode) = OpenMode::from_str(value.to_str()) {
checker.diagnostics.push(create_check(
checker.diagnostics.push(create_diagnostic(
call,
mode_param,
mode.replacement_value(),
checker.locator(),
checker.source_type,
checker.parsed().tokens(),
));
}
}
@ -146,18 +143,17 @@ impl OpenMode {
}
}
fn create_check<T: Ranged>(
expr: &T,
fn create_diagnostic(
call: &ast::ExprCall,
mode_param: &Expr,
replacement_value: Option<&str>,
locator: &Locator,
source_type: PySourceType,
tokens: &Tokens,
) -> Diagnostic {
let mut diagnostic = Diagnostic::new(
RedundantOpenModes {
replacement: replacement_value.map(ToString::to_string),
},
expr.range(),
call.range(),
);
if let Some(content) = replacement_value {
@ -166,52 +162,53 @@ fn create_check<T: Ranged>(
mode_param.range(),
)));
} else {
diagnostic.try_set_fix(|| {
create_remove_param_fix(locator, expr, mode_param, source_type).map(Fix::safe_edit)
});
diagnostic
.try_set_fix(|| create_remove_param_fix(call, mode_param, tokens).map(Fix::safe_edit));
}
diagnostic
}
fn create_remove_param_fix<T: Ranged>(
locator: &Locator,
expr: &T,
fn create_remove_param_fix(
call: &ast::ExprCall,
mode_param: &Expr,
source_type: PySourceType,
tokens: &Tokens,
) -> Result<Edit> {
let content = locator.slice(expr);
// Find the last comma before mode_param and create a deletion fix
// starting from the comma and ending after mode_param.
let mut fix_start: Option<TextSize> = None;
let mut fix_end: Option<TextSize> = None;
let mut is_first_arg: bool = false;
let mut delete_first_arg: bool = false;
for (tok, range) in lexer::lex_starts_at(content, source_type.as_mode(), expr.start()).flatten()
{
if range.start() == mode_param.start() {
for token in tokens.in_range(call.range()) {
if token.start() == mode_param.start() {
if is_first_arg {
delete_first_arg = true;
continue;
}
fix_end = Some(range.end());
fix_end = Some(token.end());
break;
}
if delete_first_arg && tok.is_name() {
fix_end = Some(range.start());
match token.kind() {
TokenKind::Name if delete_first_arg => {
fix_end = Some(token.start());
break;
}
if tok.is_lpar() {
TokenKind::Lpar => {
is_first_arg = true;
fix_start = Some(range.end());
fix_start = Some(token.end());
}
if tok.is_comma() {
TokenKind::Comma => {
is_first_arg = false;
if !delete_first_arg {
fix_start = Some(range.start());
fix_start = Some(token.start());
}
}
_ => {}
}
}
match (fix_start, fix_end) {
(Some(start), Some(end)) => Ok(Edit::deletion(start, end)),
_ => Err(anyhow::anyhow!(

View file

@ -4,6 +4,7 @@ use regex::Regex;
use ruff_diagnostics::{AlwaysFixableViolation, Diagnostic, Edit, Fix};
use ruff_macros::{derive_message_formats, violation};
use ruff_python_index::Indexer;
use ruff_python_trivia::CommentRanges;
use ruff_source_file::Locator;
use ruff_text_size::{Ranged, TextRange};
@ -49,10 +50,11 @@ pub(crate) fn unnecessary_coding_comment(
diagnostics: &mut Vec<Diagnostic>,
locator: &Locator,
indexer: &Indexer,
comment_ranges: &CommentRanges,
) {
// The coding comment must be on one of the first two lines. Since each comment spans at least
// one line, we only need to check the first two comments at most.
for comment_range in indexer.comment_ranges().iter().take(2) {
for comment_range in comment_ranges.iter().take(2) {
// If leading content is not whitespace then it's not a valid coding comment e.g.
// ```
// print(x) # coding=utf8

View file

@ -1,7 +1,7 @@
use ruff_diagnostics::{AlwaysFixableViolation, Diagnostic, Edit, Fix};
use ruff_macros::{derive_message_formats, violation};
use ruff_python_ast::{self as ast, Arguments, Expr, Keyword, PySourceType};
use ruff_python_parser::{lexer, AsMode, Tok};
use ruff_python_ast::{self as ast, Arguments, Expr, Keyword};
use ruff_python_parser::{TokenKind, Tokens};
use ruff_source_file::Locator;
use ruff_text_size::{Ranged, TextRange};
@ -117,33 +117,26 @@ fn match_encoding_arg(arguments: &Arguments) -> Option<EncodingArg> {
}
/// Return a [`Fix`] replacing the call to encode with a byte string.
fn replace_with_bytes_literal(
locator: &Locator,
call: &ast::ExprCall,
source_type: PySourceType,
) -> Fix {
fn replace_with_bytes_literal(locator: &Locator, call: &ast::ExprCall, tokens: &Tokens) -> Fix {
// Build up a replacement string by prefixing all string tokens with `b`.
let contents = locator.slice(call);
let mut replacement = String::with_capacity(contents.len() + 1);
let mut replacement = String::with_capacity(call.range().len().to_usize() + 1);
let mut prev = call.start();
for (tok, range) in
lexer::lex_starts_at(contents, source_type.as_mode(), call.start()).flatten()
{
match tok {
Tok::Dot => break,
Tok::String { .. } => {
replacement.push_str(locator.slice(TextRange::new(prev, range.start())));
let string = locator.slice(range);
for token in tokens.in_range(call.range()) {
match token.kind() {
TokenKind::Dot => break,
TokenKind::String => {
replacement.push_str(locator.slice(TextRange::new(prev, token.start())));
let string = locator.slice(token);
replacement.push_str(&format!(
"b{}",
&string.trim_start_matches('u').trim_start_matches('U')
));
}
_ => {
replacement.push_str(locator.slice(TextRange::new(prev, range.end())));
replacement.push_str(locator.slice(TextRange::new(prev, token.end())));
}
}
prev = range.end();
prev = token.end();
}
Fix::safe_edit(Edit::range_replacement(
@ -172,7 +165,7 @@ pub(crate) fn unnecessary_encode_utf8(checker: &mut Checker, call: &ast::ExprCal
diagnostic.set_fix(replace_with_bytes_literal(
checker.locator(),
call,
checker.source_type,
checker.parsed().tokens(),
));
checker.diagnostics.push(diagnostic);
} else if let EncodingArg::Keyword(kwarg) = encoding_arg {

View file

@ -116,7 +116,7 @@ pub(crate) fn yield_in_for_loop(checker: &mut Checker, stmt_for: &ast::StmtFor)
parenthesized_range(
iter.as_ref().into(),
stmt_for.into(),
checker.indexer().comment_ranges(),
checker.parsed().comment_ranges(),
checker.locator().contents(),
)
.unwrap_or(iter.range()),

View file

@ -7,7 +7,7 @@ use ruff_python_ast::comparable::ComparableExpr;
use ruff_python_ast::helpers::contains_effect;
use ruff_python_ast::parenthesize::parenthesized_range;
use ruff_python_ast::Expr;
use ruff_python_index::Indexer;
use ruff_python_trivia::CommentRanges;
use ruff_source_file::Locator;
use ruff_text_size::Ranged;
@ -74,8 +74,18 @@ pub(crate) fn if_exp_instead_of_or_operator(checker: &mut Checker, if_expr: &ast
Edit::range_replacement(
format!(
"{} or {}",
parenthesize_test(test, if_expr, checker.indexer(), checker.locator()),
parenthesize_test(orelse, if_expr, checker.indexer(), checker.locator()),
parenthesize_test(
test,
if_expr,
checker.parsed().comment_ranges(),
checker.locator()
),
parenthesize_test(
orelse,
if_expr,
checker.parsed().comment_ranges(),
checker.locator()
),
),
if_expr.range(),
),
@ -99,13 +109,13 @@ pub(crate) fn if_exp_instead_of_or_operator(checker: &mut Checker, if_expr: &ast
fn parenthesize_test<'a>(
expr: &Expr,
if_expr: &ast::ExprIf,
indexer: &Indexer,
comment_ranges: &CommentRanges,
locator: &Locator<'a>,
) -> Cow<'a, str> {
if let Some(range) = parenthesized_range(
expr.into(),
if_expr.into(),
indexer.comment_ranges(),
comment_ranges,
locator.contents(),
) {
Cow::Borrowed(locator.slice(range))

View file

@ -114,7 +114,7 @@ pub(crate) fn repeated_append(checker: &mut Checker, stmt: &Stmt) {
// # comment
// a.append(2)
// ```
if group.is_consecutive && !checker.indexer().comment_ranges().intersects(group.range())
if group.is_consecutive && !checker.parsed().comment_ranges().intersects(group.range())
{
diagnostic.set_fix(Fix::unsafe_edit(Edit::replacement(
replacement,

View file

@ -83,7 +83,7 @@ pub(crate) fn single_item_membership_test(
&[membership_test.replacement_op()],
&[item.clone()],
expr.into(),
checker.indexer().comment_ranges(),
checker.parsed().comment_ranges(),
checker.locator(),
),
expr.range(),

View file

@ -199,7 +199,7 @@ pub(crate) fn collection_literal_concatenation(checker: &mut Checker, expr: &Exp
expr.range(),
);
if !checker
.indexer()
.parsed()
.comment_ranges()
.has_comments(expr, checker.locator())
{

View file

@ -69,9 +69,9 @@ impl AlwaysFixableViolation for InvalidFormatterSuppressionComment {
/// RUF028
pub(crate) fn ignored_formatter_suppression_comment(checker: &mut Checker, suite: &ast::Suite) {
let indexer = checker.indexer();
let locator = checker.locator();
let comment_ranges: SmallVec<[SuppressionComment; 8]> = indexer
let comment_ranges: SmallVec<[SuppressionComment; 8]> = checker
.parsed()
.comment_ranges()
.into_iter()
.filter_map(|range| {

View file

@ -114,10 +114,12 @@ fn should_be_fstring(
}
let fstring_expr = format!("f{}", locator.slice(literal));
let Ok(parsed) = parse_expression(&fstring_expr) else {
return false;
};
// Note: Range offsets for `value` are based on `fstring_expr`
let Ok(ast::Expr::FString(ast::ExprFString { value, .. })) = parse_expression(&fstring_expr)
else {
let Some(ast::ExprFString { value, .. }) = parsed.expr().as_f_string_expr() else {
return false;
};

View file

@ -84,7 +84,7 @@ pub(crate) fn parenthesize_chained_logical_operators(
if parenthesized_range(
bool_op.into(),
expr.into(),
checker.indexer().comment_ranges(),
checker.parsed().comment_ranges(),
locator.contents(),
)
.is_none()

View file

@ -111,7 +111,7 @@ fn convert_to_reduce(iterable: &Expr, call: &ast::ExprCall, checker: &Checker) -
parenthesized_range(
iterable.into(),
call.arguments.as_any_node_ref(),
checker.indexer().comment_ranges(),
checker.parsed().comment_ranges(),
checker.locator().contents(),
)
.unwrap_or(iterable.range()),

View file

@ -8,7 +8,7 @@ use std::cmp::Ordering;
use ruff_python_ast as ast;
use ruff_python_codegen::Stylist;
use ruff_python_parser::{lexer, Mode, Tok, TokenKind};
use ruff_python_parser::{TokenKind, Tokens};
use ruff_python_stdlib::str::is_cased_uppercase;
use ruff_python_trivia::{first_non_trivia_token, leading_indentation, SimpleTokenKind};
use ruff_source_file::Locator;
@ -336,6 +336,7 @@ impl<'a> MultilineStringSequenceValue<'a> {
range: TextRange,
kind: SequenceKind,
locator: &Locator,
tokens: &Tokens,
string_items: &[&'a str],
) -> Option<MultilineStringSequenceValue<'a>> {
// Parse the multiline string sequence using the raw tokens.
@ -344,7 +345,7 @@ impl<'a> MultilineStringSequenceValue<'a> {
//
// Step (1). Start by collecting information on each line individually:
let (lines, ends_with_trailing_comma) =
collect_string_sequence_lines(range, kind, locator, string_items)?;
collect_string_sequence_lines(range, kind, tokens, string_items)?;
// Step (2). Group lines together into sortable "items":
// - Any "item" contains a single element of the list/tuple
@ -488,7 +489,7 @@ impl Ranged for MultilineStringSequenceValue<'_> {
fn collect_string_sequence_lines<'a>(
range: TextRange,
kind: SequenceKind,
locator: &Locator,
tokens: &Tokens,
string_items: &[&'a str],
) -> Option<(Vec<StringSequenceLine<'a>>, bool)> {
// These first two variables are used for keeping track of state
@ -501,39 +502,34 @@ fn collect_string_sequence_lines<'a>(
// An iterator over the string values in the sequence.
let mut string_items_iter = string_items.iter();
// `lex_starts_at()` gives us absolute ranges rather than relative ranges,
// but (surprisingly) we still need to pass in the slice of code we want it to lex,
// rather than the whole source file:
let mut token_iter =
lexer::lex_starts_at(locator.slice(range), Mode::Expression, range.start());
let (first_tok, _) = token_iter.next()?.ok()?;
if TokenKind::from(&first_tok) != kind.opening_token_for_multiline_definition() {
let mut token_iter = tokens.in_range(range).iter();
let first_token = token_iter.next()?;
if first_token.kind() != kind.opening_token_for_multiline_definition() {
return None;
}
let expected_final_token = kind.closing_token_for_multiline_definition();
for pair in token_iter {
let (tok, subrange) = pair.ok()?;
match tok {
Tok::NonLogicalNewline => {
for token in token_iter {
match token.kind() {
TokenKind::NonLogicalNewline => {
lines.push(line_state.into_string_sequence_line());
line_state = LineState::default();
}
Tok::Comment(_) => {
line_state.visit_comment_token(subrange);
TokenKind::Comment => {
line_state.visit_comment_token(token.range());
}
Tok::String { .. } => {
TokenKind::String => {
let Some(string_value) = string_items_iter.next() else {
unreachable!("Expected the number of string tokens to be equal to the number of string items in the sequence");
};
line_state.visit_string_token(string_value, subrange);
line_state.visit_string_token(string_value, token.range());
ends_with_trailing_comma = false;
}
Tok::Comma => {
line_state.visit_comma_token(subrange);
TokenKind::Comma => {
line_state.visit_comma_token(token.range());
ends_with_trailing_comma = true;
}
tok if TokenKind::from(&tok) == expected_final_token => {
kind if kind == expected_final_token => {
lines.push(line_state.into_string_sequence_line());
break;
}

View file

@ -216,6 +216,7 @@ fn create_fix(
range,
kind,
locator,
checker.parsed().tokens(),
string_items,
)?;
assert_eq!(value.len(), elts.len());

View file

@ -210,6 +210,7 @@ impl<'a> StringLiteralDisplay<'a> {
self.range(),
*sequence_kind,
locator,
checker.parsed().tokens(),
elements,
)?;
assert_eq!(analyzed_sequence.len(), self.elts.len());

View file

@ -15,15 +15,15 @@
/// will not converge.
use ruff_diagnostics::{Diagnostic, Edit, Fix, FixAvailability, Violation};
use ruff_macros::{derive_message_formats, violation};
use ruff_python_index::Indexer;
use ruff_python_trivia::CommentRanges;
use ruff_source_file::Locator;
use ruff_text_size::TextSize;
use crate::registry::Rule;
/// Check if a comment exists anywhere in a the given file
fn comment_exists(text: &str, locator: &Locator, indexer: &Indexer) -> bool {
for range in indexer.comment_ranges() {
fn comment_exists(text: &str, locator: &Locator, comment_ranges: &CommentRanges) -> bool {
for range in comment_ranges {
let comment_text = locator.slice(range);
if text.trim_end() == comment_text {
return true;
@ -49,7 +49,7 @@ pub(crate) const TEST_RULES: &[Rule] = &[
];
pub(crate) trait TestRule {
fn diagnostic(locator: &Locator, indexer: &Indexer) -> Option<Diagnostic>;
fn diagnostic(locator: &Locator, comment_ranges: &CommentRanges) -> Option<Diagnostic>;
}
/// ## What it does
@ -80,7 +80,7 @@ impl Violation for StableTestRule {
}
impl TestRule for StableTestRule {
fn diagnostic(_locator: &Locator, _indexer: &Indexer) -> Option<Diagnostic> {
fn diagnostic(_locator: &Locator, _comment_ranges: &CommentRanges) -> Option<Diagnostic> {
Some(Diagnostic::new(
StableTestRule,
ruff_text_size::TextRange::default(),
@ -116,9 +116,9 @@ impl Violation for StableTestRuleSafeFix {
}
impl TestRule for StableTestRuleSafeFix {
fn diagnostic(locator: &Locator, indexer: &Indexer) -> Option<Diagnostic> {
fn diagnostic(locator: &Locator, comment_ranges: &CommentRanges) -> Option<Diagnostic> {
let comment = format!("# fix from stable-test-rule-safe-fix\n");
if comment_exists(&comment, locator, indexer) {
if comment_exists(&comment, locator, comment_ranges) {
None
} else {
Some(
@ -160,9 +160,9 @@ impl Violation for StableTestRuleUnsafeFix {
}
impl TestRule for StableTestRuleUnsafeFix {
fn diagnostic(locator: &Locator, indexer: &Indexer) -> Option<Diagnostic> {
fn diagnostic(locator: &Locator, comment_ranges: &CommentRanges) -> Option<Diagnostic> {
let comment = format!("# fix from stable-test-rule-unsafe-fix\n");
if comment_exists(&comment, locator, indexer) {
if comment_exists(&comment, locator, comment_ranges) {
None
} else {
Some(
@ -207,9 +207,9 @@ impl Violation for StableTestRuleDisplayOnlyFix {
}
impl TestRule for StableTestRuleDisplayOnlyFix {
fn diagnostic(locator: &Locator, indexer: &Indexer) -> Option<Diagnostic> {
fn diagnostic(locator: &Locator, comment_ranges: &CommentRanges) -> Option<Diagnostic> {
let comment = format!("# fix from stable-test-rule-display-only-fix\n");
if comment_exists(&comment, locator, indexer) {
if comment_exists(&comment, locator, comment_ranges) {
None
} else {
Some(
@ -254,7 +254,7 @@ impl Violation for PreviewTestRule {
}
impl TestRule for PreviewTestRule {
fn diagnostic(_locator: &Locator, _indexer: &Indexer) -> Option<Diagnostic> {
fn diagnostic(_locator: &Locator, _comment_ranges: &CommentRanges) -> Option<Diagnostic> {
Some(Diagnostic::new(
PreviewTestRule,
ruff_text_size::TextRange::default(),
@ -290,7 +290,7 @@ impl Violation for NurseryTestRule {
}
impl TestRule for NurseryTestRule {
fn diagnostic(_locator: &Locator, _indexer: &Indexer) -> Option<Diagnostic> {
fn diagnostic(_locator: &Locator, _comment_ranges: &CommentRanges) -> Option<Diagnostic> {
Some(Diagnostic::new(
NurseryTestRule,
ruff_text_size::TextRange::default(),
@ -326,7 +326,7 @@ impl Violation for DeprecatedTestRule {
}
impl TestRule for DeprecatedTestRule {
fn diagnostic(_locator: &Locator, _indexer: &Indexer) -> Option<Diagnostic> {
fn diagnostic(_locator: &Locator, _comment_ranges: &CommentRanges) -> Option<Diagnostic> {
Some(Diagnostic::new(
DeprecatedTestRule,
ruff_text_size::TextRange::default(),
@ -362,7 +362,7 @@ impl Violation for AnotherDeprecatedTestRule {
}
impl TestRule for AnotherDeprecatedTestRule {
fn diagnostic(_locator: &Locator, _indexer: &Indexer) -> Option<Diagnostic> {
fn diagnostic(_locator: &Locator, _comment_ranges: &CommentRanges) -> Option<Diagnostic> {
Some(Diagnostic::new(
AnotherDeprecatedTestRule,
ruff_text_size::TextRange::default(),
@ -398,7 +398,7 @@ impl Violation for RemovedTestRule {
}
impl TestRule for RemovedTestRule {
fn diagnostic(_locator: &Locator, _indexer: &Indexer) -> Option<Diagnostic> {
fn diagnostic(_locator: &Locator, _comment_ranges: &CommentRanges) -> Option<Diagnostic> {
Some(Diagnostic::new(
RemovedTestRule,
ruff_text_size::TextRange::default(),
@ -434,7 +434,7 @@ impl Violation for AnotherRemovedTestRule {
}
impl TestRule for AnotherRemovedTestRule {
fn diagnostic(_locator: &Locator, _indexer: &Indexer) -> Option<Diagnostic> {
fn diagnostic(_locator: &Locator, _comment_ranges: &CommentRanges) -> Option<Diagnostic> {
Some(Diagnostic::new(
AnotherRemovedTestRule,
ruff_text_size::TextRange::default(),
@ -470,7 +470,7 @@ impl Violation for RedirectedFromTestRule {
}
impl TestRule for RedirectedFromTestRule {
fn diagnostic(_locator: &Locator, _indexer: &Indexer) -> Option<Diagnostic> {
fn diagnostic(_locator: &Locator, _comment_ranges: &CommentRanges) -> Option<Diagnostic> {
Some(Diagnostic::new(
RedirectedFromTestRule,
ruff_text_size::TextRange::default(),
@ -506,7 +506,7 @@ impl Violation for RedirectedToTestRule {
}
impl TestRule for RedirectedToTestRule {
fn diagnostic(_locator: &Locator, _indexer: &Indexer) -> Option<Diagnostic> {
fn diagnostic(_locator: &Locator, _comment_ranges: &CommentRanges) -> Option<Diagnostic> {
Some(Diagnostic::new(
RedirectedToTestRule,
ruff_text_size::TextRange::default(),
@ -542,7 +542,7 @@ impl Violation for RedirectedFromPrefixTestRule {
}
impl TestRule for RedirectedFromPrefixTestRule {
fn diagnostic(_locator: &Locator, _indexer: &Indexer) -> Option<Diagnostic> {
fn diagnostic(_locator: &Locator, _comment_ranges: &CommentRanges) -> Option<Diagnostic> {
Some(Diagnostic::new(
RedirectedFromPrefixTestRule,
ruff_text_size::TextRange::default(),

Some files were not shown because too many files have changed in this diff Show more