ruff/fuzz
Brent Westbrook 79c949f0f7
Some checks are pending
CI / mkdocs (push) Waiting to run
CI / Determine changes (push) Waiting to run
CI / cargo fmt (push) Waiting to run
CI / cargo clippy (push) Blocked by required conditions
CI / cargo test (linux) (push) Blocked by required conditions
CI / cargo test (linux, release) (push) Blocked by required conditions
CI / cargo test (windows) (push) Blocked by required conditions
CI / cargo test (wasm) (push) Blocked by required conditions
CI / cargo build (release) (push) Waiting to run
CI / cargo build (msrv) (push) Blocked by required conditions
CI / cargo fuzz build (push) Blocked by required conditions
CI / fuzz parser (push) Blocked by required conditions
CI / test scripts (push) Blocked by required conditions
CI / ecosystem (push) Blocked by required conditions
CI / Fuzz for new ty panics (push) Blocked by required conditions
CI / cargo shear (push) Blocked by required conditions
CI / python package (push) Waiting to run
CI / pre-commit (push) Waiting to run
CI / formatter instabilities and black similarity (push) Blocked by required conditions
CI / test ruff-lsp (push) Blocked by required conditions
CI / check playground (push) Blocked by required conditions
CI / benchmarks-instrumented (push) Blocked by required conditions
CI / benchmarks-walltime (push) Blocked by required conditions
[ty Playground] Release / publish (push) Waiting to run
Don't cache files with diagnostics (#19869)
Summary
--

To take advantage of the new diagnostics, we need to update our caching
model to include all of the information supported by `ruff_db`'s
diagnostic type. Instead of trying to serialize all of this information,
Micha suggested simply not caching files with diagnostics, like we
already do for files with syntax errors. This PR is an attempt at that
approach.

This has the added benefit of trimming down our `Rule` derives since
this was the last place the `FromStr`/`strum_macros::EnumString`
implementation was used, as well as the (de)serialization macros and
`CacheKey`.

Test Plan
--

Existing tests, with their input updated not to include a diagnostic,
plus a new test showing that files with lint diagnostics are not cached.

Benchmarks
--

In addition to tests, we wanted to check that this doesn't degrade
performance too much. I posted part of this new analysis in
https://github.com/astral-sh/ruff/issues/18198#issuecomment-3175048672,
but I'll duplicate it here. In short, there's not much difference
between `main` and this branch for projects with few diagnostics
(`home-assistant`, `airflow`), as expected. The difference for projects
with many diagnostics (`cpython`) is quite a bit bigger (~300 ms vs ~220
ms), but most projects that run ruff regularly are likely to have very
few diagnostics, so this may not be a problem practically.

I guess GitHub isn't really rendering this as I intended, but the extra
separator line is meant to separate the benchmarks on `main` (above the
line) from this branch (below the line).

| Command | Mean [ms] | Min [ms] | Max [ms] |

|:--------------------------------------------------------------|----------:|---------:|---------:|
| `ruff check cpython --no-cache --isolated --exit-zero` | 322.0 | 317.5
| 326.2 |
| `ruff check cpython --isolated --exit-zero` | 217.3 | 209.8 | 237.9 |
| `ruff check home-assistant --no-cache --isolated --exit-zero` | 279.5
| 277.0 | 283.6 |
| `ruff check home-assistant --isolated --exit-zero` | 37.2 | 35.7 |
40.6 |
| `ruff check airflow --no-cache --isolated --exit-zero` | 133.1 | 130.4
| 146.4 |
| `ruff check airflow --isolated --exit-zero` | 34.7 | 32.9 | 41.6 |

|:--------------------------------------------------------------|----------:|---------:|---------:|
| `ruff check cpython --no-cache --isolated --exit-zero` | 330.1 | 324.5
| 333.6 |
| `ruff check cpython --isolated --exit-zero` | 309.2 | 306.1 | 314.7 |
| `ruff check home-assistant --no-cache --isolated --exit-zero` | 288.6
| 279.4 | 302.3 |
| `ruff check home-assistant --isolated --exit-zero` | 39.8 | 36.9 |
42.4 |
| `ruff check airflow --no-cache --isolated --exit-zero` | 134.5 | 131.3
| 140.6 |
| `ruff check airflow --isolated --exit-zero` | 39.1 | 37.2 | 44.3 |

I had Claude adapt one of the
[scripts](https://github.com/sharkdp/hyperfine/blob/master/scripts/plot_whisker.py)
from the hyperfine repo to make this plot, so it's not quite perfect,
but maybe it's still useful. The table is probably more reliable for
close comparisons. I'll put more details about the benchmarks below for
the sake of future reproducibility.

<img width="4472" height="2368" alt="image"
src="https://github.com/user-attachments/assets/1c42d13e-818a-44e7-b34c-247340a936d7"
/>

<details><summary>Benchmark details</summary>
<p>

The versions of each project:
- CPython: 6322edd260e8cad4b09636e05ddfb794a96a0451, the 3.10 branch
from the contributing docs
- `home-assistant`: 5585376b406f099fb29a970b160877b57e5efcb0
- `airflow`: 29a1cb0cfde9d99b1774571688ed86cb60123896

The last two are just the main branches at the time I cloned the repos.

I don't think our Ruff config should be applied since I used
`--isolated`, but these are cloned into my copy of Ruff at
`crates/ruff_linter/resources/test`, and I trimmed the
`./target/release/` prefix from each of the commands, but these are
builds of Ruff in release mode.

And here's the script with the `hyperfine` invocation:

```shell
#!/bin/bash

cargo build --release --bin ruff

# git clone --depth 1 https://github.com/home-assistant/core crates/ruff_linter/resources/test/home-assistant
# git clone --depth 1 https://github.com/apache/airflow crates/ruff_linter/resources/test/airflow

bin=./target/release/ruff
resources=./crates/ruff_linter/resources/test
cpython=$resources/cpython
home_assistant=$resources/home-assistant
airflow=$resources/airflow

base=${1:-bench}

hyperfine --warmup 10 --export-json $base.json --export-markdown $base.md \
		  "$bin check $cpython --no-cache --isolated --exit-zero" \
		  "$bin check $cpython --isolated --exit-zero" \
		  "$bin check $home_assistant --no-cache --isolated --exit-zero" \
		  "$bin check $home_assistant --isolated --exit-zero" \
		  "$bin check $airflow --no-cache --isolated --exit-zero" \
		  "$bin check $airflow --isolated --exit-zero"
```

I ran this once on `main` (`baseline` in the graph, top half of the
table) and once on this branch (`nocache` and bottom of the table).

</p>
</details>
2025-08-12 15:28:44 -04:00
..
fuzz_targets Don't cache files with diagnostics (#19869) 2025-08-12 15:28:44 -04:00
.gitignore Remove symlinks from the fuzz directory (#18095) 2025-05-14 21:05:52 +05:30
Cargo.toml Update salsa to pull in tracked struct changes (#19843) 2025-08-12 13:17:46 +02:00
init-fuzzer.sh Add shellcheck to pre-commit (#19361) 2025-07-15 16:49:13 +00:00
README.md Remove symlinks from the fuzz directory (#18095) 2025-05-14 21:05:52 +05:30

ruff-fuzz

Fuzzers and associated utilities for automatic testing of Ruff.

Usage

To use the fuzzers provided in this directory, start by invoking:

./fuzz/init-fuzzers.sh

This will install cargo-fuzz and optionally download a dataset which improves the efficacy of the testing.

Note

This step is necessary for initialising the corpus directory, as all fuzzers share a common corpus.

The dataset may take several hours to download and clean, so if you're just looking to try out the fuzzers, skip the dataset download, though be warned that some features simply cannot be tested without it (very unlikely for the fuzzer to generate valid python code from "thin air").

Once you have initialised the fuzzers, you can then execute any fuzzer with:

cargo fuzz run -s none name_of_fuzzer -- -timeout=1

Note

Users using Apple M1 devices must use a nightly compiler and omit the -s none portion of this command, as this architecture does not support fuzzing without a sanitizer.

cargo +nightly fuzz run name_of_fuzzer -- -timeout=1

You can view the names of the available fuzzers with cargo fuzz list. For specific details about how each fuzzer works, please read this document in its entirety.

Note

Re-run ./init-fuzzer.sh (say no to the dataset download) after adding more file-based test cases to the repository. This will make sure that the corpus is up to date with any new Python code added to the repository.

Debugging a crash

Once you've found a crash, you'll need to debug it. The easiest first step in this process is to minimise the input such that the crash is still triggered with a smaller input. cargo-fuzz supports this out of the box with:

cargo fuzz tmin -s none name_of_fuzzer artifacts/name_of_fuzzer/crash-...

From here, you will need to analyse the input and potentially the behaviour of the program. The debugging process from here is unfortunately less well-defined, so you will need to apply some expertise here. Happy hunting!

A brief introduction to fuzzers

Fuzzing, or fuzz testing, is the process of providing generated data to a program under test. The most common variety of fuzzers are mutational fuzzers; given a set of existing inputs (a "corpus"), it will attempt to slightly change (or "mutate") these inputs into new inputs that cover parts of the code that haven't yet been observed. Using this strategy, we can quite efficiently generate testcases which cover significant portions of the program, both with expected and unexpected data. This is really quite effective for finding bugs.

The fuzzers here use cargo-fuzz, a utility which allows Rust to integrate with libFuzzer, the fuzzer library built into LLVM. Each source file present in fuzz_targets is a harness, which is, in effect, a unit test which can handle different inputs. When an input is provided to a harness, the harness processes this data and libFuzzer observes the code coverage and any special values used in comparisons over the course of the run. Special values are preserved for future mutations and inputs which cover new regions of code are added to the corpus.

Each fuzzer harness in detail

Each fuzzer harness in fuzz_targets targets a different aspect of Ruff and tests them in different ways. While there is implementation-specific documentation in the source code itself, each harness is briefly described below.

ty_check_invalid_syntax

This fuzz harness checks that the type checker (ty) does not panic when checking a source file with invalid syntax. This rejects any corpus entries that is already valid Python code. Currently, this is limited to syntax errors that's produced by Ruff's Python parser which means that it does not cover all possible syntax errors (https://github.com/astral-sh/ruff/issues/11934). A possible workaround for now would be to bypass the parser and run the type checker on all inputs regardless of syntax errors.

ruff_parse_simple

This fuzz harness does not perform any "smart" testing of Ruff; it merely checks that the parsing and unparsing of a particular input (what would normally be a source code file) does not crash. It also attempts to verify that the locations of tokens and errors identified do not fall in the middle of a UTF-8 code point, which may cause downstream panics. While this is unlikely to find any issues on its own, it executes very quickly and covers a large and diverse code region that may speed up the generation of inputs and therefore make a more valuable corpus quickly. It is particularly useful if you skip the dataset generation.

ruff_parse_idempotency

This fuzz harness checks that Ruff's parser is idempotent in order to check that it is not incorrectly parsing or unparsing an input. It can be built in two modes: default (where it is only checked that the parser does not enter an unstable state) or full idempotency (the parser is checked to ensure that it will always produce the same output after the first unparsing). Full idempotency mode can be used by enabling the full-idempotency feature when running the fuzzer, but this may be too strict of a restriction for initial testing.

ruff_fix_validity

This fuzz harness checks that fixes applied by Ruff do not introduce new errors using the existing ruff_linter::test::test_snippet testing utility. It currently is only configured to use default settings, but may be extended in future versions to test non-default linter settings.

ruff_formatter_idempotency

This fuzz harness ensures that the formatter is idempotent which detects possible unsteady states of Ruff's formatter.

ruff_formatter_validity

This fuzz harness checks that Ruff's formatter does not introduce new linter errors/warnings by linting once, counting the number of each error type, then formatting, then linting again and ensuring that the number of each error type does not increase across formats. This has the beneficial side effect of discovering cases where the linter does not discover a lint error when it should have due to a formatting inconsistency.