mirror of
https://github.com/astral-sh/ruff.git
synced 2025-12-23 09:19:39 +00:00
Some checks are pending
CI / mkdocs (push) Waiting to run
CI / Determine changes (push) Waiting to run
CI / cargo fmt (push) Waiting to run
CI / cargo clippy (push) Blocked by required conditions
CI / cargo test (linux) (push) Blocked by required conditions
CI / cargo test (linux, release) (push) Blocked by required conditions
CI / cargo test (windows) (push) Blocked by required conditions
CI / cargo test (wasm) (push) Blocked by required conditions
CI / cargo build (release) (push) Waiting to run
CI / cargo build (msrv) (push) Blocked by required conditions
CI / cargo fuzz build (push) Blocked by required conditions
CI / fuzz parser (push) Blocked by required conditions
CI / test scripts (push) Blocked by required conditions
CI / ecosystem (push) Blocked by required conditions
CI / Fuzz for new ty panics (push) Blocked by required conditions
CI / cargo shear (push) Blocked by required conditions
CI / python package (push) Waiting to run
CI / pre-commit (push) Waiting to run
CI / formatter instabilities and black similarity (push) Blocked by required conditions
CI / test ruff-lsp (push) Blocked by required conditions
CI / check playground (push) Blocked by required conditions
CI / benchmarks-instrumented (push) Blocked by required conditions
CI / benchmarks-walltime (push) Blocked by required conditions
[ty Playground] Release / publish (push) Waiting to run
Summary -- To take advantage of the new diagnostics, we need to update our caching model to include all of the information supported by `ruff_db`'s diagnostic type. Instead of trying to serialize all of this information, Micha suggested simply not caching files with diagnostics, like we already do for files with syntax errors. This PR is an attempt at that approach. This has the added benefit of trimming down our `Rule` derives since this was the last place the `FromStr`/`strum_macros::EnumString` implementation was used, as well as the (de)serialization macros and `CacheKey`. Test Plan -- Existing tests, with their input updated not to include a diagnostic, plus a new test showing that files with lint diagnostics are not cached. Benchmarks -- In addition to tests, we wanted to check that this doesn't degrade performance too much. I posted part of this new analysis in https://github.com/astral-sh/ruff/issues/18198#issuecomment-3175048672, but I'll duplicate it here. In short, there's not much difference between `main` and this branch for projects with few diagnostics (`home-assistant`, `airflow`), as expected. The difference for projects with many diagnostics (`cpython`) is quite a bit bigger (~300 ms vs ~220 ms), but most projects that run ruff regularly are likely to have very few diagnostics, so this may not be a problem practically. I guess GitHub isn't really rendering this as I intended, but the extra separator line is meant to separate the benchmarks on `main` (above the line) from this branch (below the line). | Command | Mean [ms] | Min [ms] | Max [ms] | |:--------------------------------------------------------------|----------:|---------:|---------:| | `ruff check cpython --no-cache --isolated --exit-zero` | 322.0 | 317.5 | 326.2 | | `ruff check cpython --isolated --exit-zero` | 217.3 | 209.8 | 237.9 | | `ruff check home-assistant --no-cache --isolated --exit-zero` | 279.5 | 277.0 | 283.6 | | `ruff check home-assistant --isolated --exit-zero` | 37.2 | 35.7 | 40.6 | | `ruff check airflow --no-cache --isolated --exit-zero` | 133.1 | 130.4 | 146.4 | | `ruff check airflow --isolated --exit-zero` | 34.7 | 32.9 | 41.6 | |:--------------------------------------------------------------|----------:|---------:|---------:| | `ruff check cpython --no-cache --isolated --exit-zero` | 330.1 | 324.5 | 333.6 | | `ruff check cpython --isolated --exit-zero` | 309.2 | 306.1 | 314.7 | | `ruff check home-assistant --no-cache --isolated --exit-zero` | 288.6 | 279.4 | 302.3 | | `ruff check home-assistant --isolated --exit-zero` | 39.8 | 36.9 | 42.4 | | `ruff check airflow --no-cache --isolated --exit-zero` | 134.5 | 131.3 | 140.6 | | `ruff check airflow --isolated --exit-zero` | 39.1 | 37.2 | 44.3 | I had Claude adapt one of the [scripts](https://github.com/sharkdp/hyperfine/blob/master/scripts/plot_whisker.py) from the hyperfine repo to make this plot, so it's not quite perfect, but maybe it's still useful. The table is probably more reliable for close comparisons. I'll put more details about the benchmarks below for the sake of future reproducibility. <img width="4472" height="2368" alt="image" src="https://github.com/user-attachments/assets/1c42d13e-818a-44e7-b34c-247340a936d7" /> <details><summary>Benchmark details</summary> <p> The versions of each project: - CPython: 6322edd260e8cad4b09636e05ddfb794a96a0451, the 3.10 branch from the contributing docs - `home-assistant`: 5585376b406f099fb29a970b160877b57e5efcb0 - `airflow`: 29a1cb0cfde9d99b1774571688ed86cb60123896 The last two are just the main branches at the time I cloned the repos. I don't think our Ruff config should be applied since I used `--isolated`, but these are cloned into my copy of Ruff at `crates/ruff_linter/resources/test`, and I trimmed the `./target/release/` prefix from each of the commands, but these are builds of Ruff in release mode. And here's the script with the `hyperfine` invocation: ```shell #!/bin/bash cargo build --release --bin ruff # git clone --depth 1 https://github.com/home-assistant/core crates/ruff_linter/resources/test/home-assistant # git clone --depth 1 https://github.com/apache/airflow crates/ruff_linter/resources/test/airflow bin=./target/release/ruff resources=./crates/ruff_linter/resources/test cpython=$resources/cpython home_assistant=$resources/home-assistant airflow=$resources/airflow base=${1:-bench} hyperfine --warmup 10 --export-json $base.json --export-markdown $base.md \ "$bin check $cpython --no-cache --isolated --exit-zero" \ "$bin check $cpython --isolated --exit-zero" \ "$bin check $home_assistant --no-cache --isolated --exit-zero" \ "$bin check $home_assistant --isolated --exit-zero" \ "$bin check $airflow --no-cache --isolated --exit-zero" \ "$bin check $airflow --isolated --exit-zero" ``` I ran this once on `main` (`baseline` in the graph, top half of the table) and once on this branch (`nocache` and bottom of the table). </p> </details>
96 lines
3.1 KiB
Rust
96 lines
3.1 KiB
Rust
//! Fuzzer harness which actively tries to find testcases that cause Ruff to introduce errors into
|
|
//! the resulting file.
|
|
|
|
#![no_main]
|
|
|
|
use std::collections::HashMap;
|
|
use std::sync::OnceLock;
|
|
|
|
use libfuzzer_sys::{fuzz_target, Corpus};
|
|
use ruff_linter::linter::ParseSource;
|
|
use ruff_linter::settings::flags::Noqa;
|
|
use ruff_linter::settings::LinterSettings;
|
|
use ruff_linter::source_kind::SourceKind;
|
|
use ruff_python_ast::PySourceType;
|
|
use ruff_python_formatter::{format_module_source, PyFormatOptions};
|
|
use similar::TextDiff;
|
|
|
|
static SETTINGS: OnceLock<LinterSettings> = OnceLock::new();
|
|
|
|
fn do_fuzz(case: &[u8]) -> Corpus {
|
|
// throw away inputs which aren't utf-8
|
|
let Ok(code) = std::str::from_utf8(case) else {
|
|
return Corpus::Reject;
|
|
};
|
|
|
|
// the settings are immutable to test_snippet, so we avoid re-initialising here
|
|
let linter_settings = SETTINGS.get_or_init(LinterSettings::default);
|
|
let format_options = PyFormatOptions::default();
|
|
|
|
let linter_result = ruff_linter::linter::lint_only(
|
|
"fuzzed-source.py".as_ref(),
|
|
None,
|
|
linter_settings,
|
|
Noqa::Enabled,
|
|
&SourceKind::Python(code.to_string()),
|
|
PySourceType::Python,
|
|
ParseSource::None,
|
|
);
|
|
|
|
if linter_result.has_invalid_syntax() {
|
|
return Corpus::Keep; // keep, but don't continue
|
|
}
|
|
|
|
let mut warnings = HashMap::new();
|
|
|
|
for msg in &linter_result.diagnostics {
|
|
let count: &mut usize = warnings.entry(msg.name()).or_default();
|
|
*count += 1;
|
|
}
|
|
|
|
// format the code once
|
|
if let Ok(formatted) = format_module_source(code, format_options.clone()) {
|
|
let formatted = formatted.as_code().to_string();
|
|
|
|
let linter_result = ruff_linter::linter::lint_only(
|
|
"fuzzed-source.py".as_ref(),
|
|
None,
|
|
linter_settings,
|
|
Noqa::Enabled,
|
|
&SourceKind::Python(formatted.clone()),
|
|
PySourceType::Python,
|
|
ParseSource::None,
|
|
);
|
|
|
|
assert!(
|
|
linter_result.has_invalid_syntax(),
|
|
"formatter introduced a parse error"
|
|
);
|
|
|
|
for msg in &linter_result.diagnostics {
|
|
if let Some(count) = warnings.get_mut(msg.name()) {
|
|
if let Some(decremented) = count.checked_sub(1) {
|
|
*count = decremented;
|
|
} else {
|
|
panic!(
|
|
"formatter introduced additional linter warning: {msg:?}\ndiff: {}",
|
|
TextDiff::from_lines(code, &formatted)
|
|
.unified_diff()
|
|
.header("Unformatted", "Formatted")
|
|
);
|
|
}
|
|
} else {
|
|
panic!(
|
|
"formatter introduced new linter warning that was not previously present: {msg:?}\ndiff: {}",
|
|
TextDiff::from_lines(code, &formatted)
|
|
.unified_diff()
|
|
.header("Unformatted", "Formatted")
|
|
);
|
|
}
|
|
}
|
|
}
|
|
|
|
Corpus::Keep
|
|
}
|
|
|
|
fuzz_target!(|case: &[u8]| -> Corpus { do_fuzz(case) });
|