mirror of
https://github.com/astral-sh/ruff.git
synced 2025-09-28 21:05:08 +00:00

(Supersedes #9152, authored by @LaBatata101) ## Summary This PR replaces the current parser generated from LALRPOP to a hand-written recursive descent parser. It also updates the grammar for [PEP 646](https://peps.python.org/pep-0646/) so that the parser outputs the correct AST. For example, in `data[*x]`, the index expression is now a tuple with a single starred expression instead of just a starred expression. Beyond the performance improvements, the parser is also error resilient and can provide better error messages. The behavior as seen by any downstream tools isn't changed. That is, the linter and formatter can still assume that the parser will _stop_ at the first syntax error. This will be updated in the following months. For more details about the change here, refer to the PR corresponding to the individual commits and the release blog post. ## Test Plan Write _lots_ and _lots_ of tests for both valid and invalid syntax and verify the output. ## Acknowledgements - @MichaReiser for reviewing 100+ parser PRs and continuously providing guidance throughout the project - @LaBatata101 for initiating the transition to a hand-written parser in #9152 - @addisoncrump for implementing the fuzzer which helped [catch](https://github.com/astral-sh/ruff/pull/10903) [a](https://github.com/astral-sh/ruff/pull/10910) [lot](https://github.com/astral-sh/ruff/pull/10966) [of](https://github.com/astral-sh/ruff/pull/10896) [bugs](https://github.com/astral-sh/ruff/pull/10877) --------- Co-authored-by: Victor Hugo Gomes <labatata101@linuxmail.org> Co-authored-by: Micha Reiser <micha@reiser.io>
67 lines
2.2 KiB
Rust
67 lines
2.2 KiB
Rust
//! Fuzzer harness which merely explores the parse/unparse coverage space and tries to make it
|
|
//! crash. On its own, this fuzzer is (hopefully) not going to find a crash.
|
|
|
|
#![no_main]
|
|
|
|
use libfuzzer_sys::{fuzz_target, Corpus};
|
|
use ruff_python_codegen::{Generator, Stylist};
|
|
use ruff_python_parser::{lexer, parse_suite, Mode, ParseError};
|
|
use ruff_source_file::Locator;
|
|
|
|
fn do_fuzz(case: &[u8]) -> Corpus {
|
|
let Ok(code) = std::str::from_utf8(case) else {
|
|
return Corpus::Reject;
|
|
};
|
|
|
|
// just round-trip it once to trigger both parse and unparse
|
|
let locator = Locator::new(code);
|
|
let python_ast = match parse_suite(code) {
|
|
Ok(stmts) => stmts,
|
|
Err(ParseError { location, .. }) => {
|
|
let offset = location.start().to_usize();
|
|
assert!(
|
|
code.is_char_boundary(offset),
|
|
"Invalid error location {} (not at char boundary)",
|
|
offset
|
|
);
|
|
return Corpus::Keep;
|
|
}
|
|
};
|
|
|
|
let tokens: Vec<_> = lexer::lex(code, Mode::Module).collect();
|
|
|
|
for maybe_token in tokens.iter() {
|
|
match maybe_token.as_ref() {
|
|
Ok((_, range)) => {
|
|
let start = range.start().to_usize();
|
|
let end = range.end().to_usize();
|
|
assert!(
|
|
code.is_char_boundary(start),
|
|
"Invalid start position {} (not at char boundary)",
|
|
start
|
|
);
|
|
assert!(
|
|
code.is_char_boundary(end),
|
|
"Invalid end position {} (not at char boundary)",
|
|
end
|
|
);
|
|
}
|
|
Err(err) => {
|
|
let offset = err.location().start().to_usize();
|
|
assert!(
|
|
code.is_char_boundary(offset),
|
|
"Invalid error location {} (not at char boundary)",
|
|
offset
|
|
);
|
|
}
|
|
}
|
|
}
|
|
|
|
let stylist = Stylist::from_tokens(&tokens, &locator);
|
|
let mut generator: Generator = (&stylist).into();
|
|
generator.unparse_suite(&python_ast);
|
|
|
|
Corpus::Keep
|
|
}
|
|
|
|
fuzz_target!(|case: &[u8]| -> Corpus { do_fuzz(case) });
|