ruff/crates/ruff_macros/src/map_codes.rs
Brent Westbrook 79c949f0f7
Some checks are pending
CI / mkdocs (push) Waiting to run
CI / Determine changes (push) Waiting to run
CI / cargo fmt (push) Waiting to run
CI / cargo clippy (push) Blocked by required conditions
CI / cargo test (linux) (push) Blocked by required conditions
CI / cargo test (linux, release) (push) Blocked by required conditions
CI / cargo test (windows) (push) Blocked by required conditions
CI / cargo test (wasm) (push) Blocked by required conditions
CI / cargo build (release) (push) Waiting to run
CI / cargo build (msrv) (push) Blocked by required conditions
CI / cargo fuzz build (push) Blocked by required conditions
CI / fuzz parser (push) Blocked by required conditions
CI / test scripts (push) Blocked by required conditions
CI / ecosystem (push) Blocked by required conditions
CI / Fuzz for new ty panics (push) Blocked by required conditions
CI / cargo shear (push) Blocked by required conditions
CI / python package (push) Waiting to run
CI / pre-commit (push) Waiting to run
CI / formatter instabilities and black similarity (push) Blocked by required conditions
CI / test ruff-lsp (push) Blocked by required conditions
CI / check playground (push) Blocked by required conditions
CI / benchmarks-instrumented (push) Blocked by required conditions
CI / benchmarks-walltime (push) Blocked by required conditions
[ty Playground] Release / publish (push) Waiting to run
Don't cache files with diagnostics (#19869)
Summary
--

To take advantage of the new diagnostics, we need to update our caching
model to include all of the information supported by `ruff_db`'s
diagnostic type. Instead of trying to serialize all of this information,
Micha suggested simply not caching files with diagnostics, like we
already do for files with syntax errors. This PR is an attempt at that
approach.

This has the added benefit of trimming down our `Rule` derives since
this was the last place the `FromStr`/`strum_macros::EnumString`
implementation was used, as well as the (de)serialization macros and
`CacheKey`.

Test Plan
--

Existing tests, with their input updated not to include a diagnostic,
plus a new test showing that files with lint diagnostics are not cached.

Benchmarks
--

In addition to tests, we wanted to check that this doesn't degrade
performance too much. I posted part of this new analysis in
https://github.com/astral-sh/ruff/issues/18198#issuecomment-3175048672,
but I'll duplicate it here. In short, there's not much difference
between `main` and this branch for projects with few diagnostics
(`home-assistant`, `airflow`), as expected. The difference for projects
with many diagnostics (`cpython`) is quite a bit bigger (~300 ms vs ~220
ms), but most projects that run ruff regularly are likely to have very
few diagnostics, so this may not be a problem practically.

I guess GitHub isn't really rendering this as I intended, but the extra
separator line is meant to separate the benchmarks on `main` (above the
line) from this branch (below the line).

| Command | Mean [ms] | Min [ms] | Max [ms] |

|:--------------------------------------------------------------|----------:|---------:|---------:|
| `ruff check cpython --no-cache --isolated --exit-zero` | 322.0 | 317.5
| 326.2 |
| `ruff check cpython --isolated --exit-zero` | 217.3 | 209.8 | 237.9 |
| `ruff check home-assistant --no-cache --isolated --exit-zero` | 279.5
| 277.0 | 283.6 |
| `ruff check home-assistant --isolated --exit-zero` | 37.2 | 35.7 |
40.6 |
| `ruff check airflow --no-cache --isolated --exit-zero` | 133.1 | 130.4
| 146.4 |
| `ruff check airflow --isolated --exit-zero` | 34.7 | 32.9 | 41.6 |

|:--------------------------------------------------------------|----------:|---------:|---------:|
| `ruff check cpython --no-cache --isolated --exit-zero` | 330.1 | 324.5
| 333.6 |
| `ruff check cpython --isolated --exit-zero` | 309.2 | 306.1 | 314.7 |
| `ruff check home-assistant --no-cache --isolated --exit-zero` | 288.6
| 279.4 | 302.3 |
| `ruff check home-assistant --isolated --exit-zero` | 39.8 | 36.9 |
42.4 |
| `ruff check airflow --no-cache --isolated --exit-zero` | 134.5 | 131.3
| 140.6 |
| `ruff check airflow --isolated --exit-zero` | 39.1 | 37.2 | 44.3 |

I had Claude adapt one of the
[scripts](https://github.com/sharkdp/hyperfine/blob/master/scripts/plot_whisker.py)
from the hyperfine repo to make this plot, so it's not quite perfect,
but maybe it's still useful. The table is probably more reliable for
close comparisons. I'll put more details about the benchmarks below for
the sake of future reproducibility.

<img width="4472" height="2368" alt="image"
src="https://github.com/user-attachments/assets/1c42d13e-818a-44e7-b34c-247340a936d7"
/>

<details><summary>Benchmark details</summary>
<p>

The versions of each project:
- CPython: 6322edd260e8cad4b09636e05ddfb794a96a0451, the 3.10 branch
from the contributing docs
- `home-assistant`: 5585376b406f099fb29a970b160877b57e5efcb0
- `airflow`: 29a1cb0cfde9d99b1774571688ed86cb60123896

The last two are just the main branches at the time I cloned the repos.

I don't think our Ruff config should be applied since I used
`--isolated`, but these are cloned into my copy of Ruff at
`crates/ruff_linter/resources/test`, and I trimmed the
`./target/release/` prefix from each of the commands, but these are
builds of Ruff in release mode.

And here's the script with the `hyperfine` invocation:

```shell
#!/bin/bash

cargo build --release --bin ruff

# git clone --depth 1 https://github.com/home-assistant/core crates/ruff_linter/resources/test/home-assistant
# git clone --depth 1 https://github.com/apache/airflow crates/ruff_linter/resources/test/airflow

bin=./target/release/ruff
resources=./crates/ruff_linter/resources/test
cpython=$resources/cpython
home_assistant=$resources/home-assistant
airflow=$resources/airflow

base=${1:-bench}

hyperfine --warmup 10 --export-json $base.json --export-markdown $base.md \
		  "$bin check $cpython --no-cache --isolated --exit-zero" \
		  "$bin check $cpython --isolated --exit-zero" \
		  "$bin check $home_assistant --no-cache --isolated --exit-zero" \
		  "$bin check $home_assistant --isolated --exit-zero" \
		  "$bin check $airflow --no-cache --isolated --exit-zero" \
		  "$bin check $airflow --isolated --exit-zero"
```

I ran this once on `main` (`baseline` in the graph, top half of the
table) and once on this branch (`nocache` and bottom of the table).

</p>
</details>
2025-08-12 15:28:44 -04:00

488 lines
16 KiB
Rust

use std::collections::{BTreeMap, HashMap};
use itertools::Itertools;
use proc_macro2::TokenStream;
use quote::{ToTokens, quote};
use syn::{
Attribute, Error, Expr, ExprCall, ExprMatch, Ident, ItemFn, LitStr, Pat, Path, Stmt, Token,
parenthesized, parse::Parse, spanned::Spanned,
};
use crate::rule_code_prefix::{get_prefix_ident, intersection_all};
/// A rule entry in the big match statement such a
/// `(Pycodestyle, "E112") => (RuleGroup::Preview, rules::pycodestyle::rules::logical_lines::NoIndentedBlock),`
#[derive(Clone)]
struct Rule {
/// The actual name of the rule, e.g., `NoIndentedBlock`.
name: Ident,
/// The linter associated with the rule, e.g., `Pycodestyle`.
linter: Ident,
/// The code associated with the rule, e.g., `"E112"`.
code: LitStr,
/// The rule group identifier, e.g., `RuleGroup::Preview`.
group: Path,
/// The path to the struct implementing the rule, e.g.
/// `rules::pycodestyle::rules::logical_lines::NoIndentedBlock`
path: Path,
/// The rule attributes, e.g. for feature gates
attrs: Vec<Attribute>,
}
pub(crate) fn map_codes(func: &ItemFn) -> syn::Result<TokenStream> {
let Some(last_stmt) = func.block.stmts.last() else {
return Err(Error::new(
func.block.span(),
"expected body to end in an expression",
));
};
let Stmt::Expr(
Expr::Call(ExprCall {
args: some_args, ..
}),
_,
) = last_stmt
else {
return Err(Error::new(
last_stmt.span(),
"expected last expression to be `Some(match (..) { .. })`",
));
};
let mut some_args = some_args.into_iter();
let (Some(Expr::Match(ExprMatch { arms, .. })), None) = (some_args.next(), some_args.next())
else {
return Err(Error::new(
last_stmt.span(),
"expected last expression to be `Some(match (..) { .. })`",
));
};
// Map from: linter (e.g., `Flake8Bugbear`) to rule code (e.g.,`"002"`) to rule data (e.g.,
// `(Rule::UnaryPrefixIncrement, RuleGroup::Stable, vec![])`).
let mut linter_to_rules: BTreeMap<Ident, BTreeMap<String, Rule>> = BTreeMap::new();
for arm in arms {
if matches!(arm.pat, Pat::Wild(..)) {
break;
}
let rule = syn::parse::<Rule>(arm.into_token_stream().into())?;
linter_to_rules
.entry(rule.linter.clone())
.or_default()
.insert(rule.code.value(), rule);
}
let linter_idents: Vec<_> = linter_to_rules.keys().collect();
let all_rules = linter_to_rules.values().flat_map(BTreeMap::values);
let mut output = register_rules(all_rules);
output.extend(quote! {
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum RuleCodePrefix {
#(#linter_idents(#linter_idents),)*
}
impl RuleCodePrefix {
pub fn linter(&self) -> &'static Linter {
match self {
#(Self::#linter_idents(..) => &Linter::#linter_idents,)*
}
}
pub fn short_code(&self) -> &'static str {
match self {
#(Self::#linter_idents(code) => code.into(),)*
}
}
}
});
for (linter, rules) in &linter_to_rules {
output.extend(super::rule_code_prefix::expand(
linter,
rules
.iter()
.map(|(code, Rule { group, attrs, .. })| (code.as_str(), group, attrs)),
));
output.extend(quote! {
impl From<#linter> for RuleCodePrefix {
fn from(linter: #linter) -> Self {
Self::#linter(linter)
}
}
// Rust doesn't yet support `impl const From<RuleCodePrefix> for RuleSelector`
// See https://github.com/rust-lang/rust/issues/67792
impl From<#linter> for crate::rule_selector::RuleSelector {
fn from(linter: #linter) -> Self {
let prefix = RuleCodePrefix::#linter(linter);
if is_single_rule_selector(&prefix) {
Self::Rule {
prefix,
redirected_from: None,
}
} else {
Self::Prefix {
prefix,
redirected_from: None,
}
}
}
}
});
}
let mut all_codes = Vec::new();
for (linter, rules) in &linter_to_rules {
let rules_by_prefix = rules_by_prefix(rules);
for (prefix, rules) in &rules_by_prefix {
let prefix_ident = get_prefix_ident(prefix);
let attrs = intersection_all(rules.iter().map(|(.., attrs)| attrs.as_slice()));
let attrs = if attrs.is_empty() {
quote!()
} else {
quote!(#(#attrs)*)
};
all_codes.push(quote! {
#attrs Self::#linter(#linter::#prefix_ident)
});
}
let mut prefix_into_iter_match_arms = quote!();
for (prefix, rules) in rules_by_prefix {
let rule_paths = rules.iter().map(|(path, .., attrs)| {
let rule_name = path.segments.last().unwrap();
quote!(#(#attrs)* Rule::#rule_name)
});
let prefix_ident = get_prefix_ident(&prefix);
let attrs = intersection_all(rules.iter().map(|(.., attrs)| attrs.as_slice()));
let attrs = if attrs.is_empty() {
quote!()
} else {
quote!(#(#attrs)*)
};
prefix_into_iter_match_arms.extend(quote! {
#attrs #linter::#prefix_ident => vec![#(#rule_paths,)*].into_iter(),
});
}
output.extend(quote! {
impl #linter {
pub(crate) fn rules(&self) -> ::std::vec::IntoIter<Rule> {
match self { #prefix_into_iter_match_arms }
}
}
});
}
output.extend(quote! {
impl RuleCodePrefix {
pub(crate) fn parse(linter: &Linter, code: &str) -> Result<Self, crate::registry::FromCodeError> {
use std::str::FromStr;
Ok(match linter {
#(Linter::#linter_idents => RuleCodePrefix::#linter_idents(#linter_idents::from_str(code).map_err(|_| crate::registry::FromCodeError::Unknown)?),)*
})
}
pub(crate) fn rules(&self) -> ::std::vec::IntoIter<Rule> {
match self {
#(RuleCodePrefix::#linter_idents(prefix) => prefix.clone().rules(),)*
}
}
}
});
let rule_to_code = generate_rule_to_code(&linter_to_rules);
output.extend(rule_to_code);
output.extend(generate_iter_impl(&linter_to_rules, &linter_idents));
Ok(output)
}
/// Group the rules by their common prefixes.
fn rules_by_prefix(
rules: &BTreeMap<String, Rule>,
) -> BTreeMap<String, Vec<(Path, Vec<Attribute>)>> {
// TODO(charlie): Why do we do this here _and_ in `rule_code_prefix::expand`?
let mut rules_by_prefix = BTreeMap::new();
for code in rules.keys() {
for i in 1..=code.len() {
let prefix = code[..i].to_string();
let rules: Vec<_> = rules
.iter()
.filter_map(|(code, rule)| {
if code.starts_with(&prefix) {
Some((rule.path.clone(), rule.attrs.clone()))
} else {
None
}
})
.collect();
rules_by_prefix.insert(prefix, rules);
}
}
rules_by_prefix
}
/// Map from rule to codes that can be used to select it.
/// This abstraction exists to support a one-to-many mapping, whereby a single rule could map
/// to multiple codes (e.g., if it existed in multiple linters, like Pylint and Flake8, under
/// different codes). We haven't actually activated this functionality yet, but some work was
/// done to support it, so the logic exists here.
fn generate_rule_to_code(linter_to_rules: &BTreeMap<Ident, BTreeMap<String, Rule>>) -> TokenStream {
let mut rule_to_codes: HashMap<&Path, Vec<&Rule>> = HashMap::new();
let mut linter_code_for_rule_match_arms = quote!();
for (linter, map) in linter_to_rules {
for (code, rule) in map {
let Rule {
path, attrs, name, ..
} = rule;
rule_to_codes.entry(path).or_default().push(rule);
linter_code_for_rule_match_arms.extend(quote! {
#(#attrs)* (Self::#linter, Rule::#name) => Some(#code),
});
}
}
let mut rule_noqa_code_match_arms = quote!();
let mut rule_group_match_arms = quote!();
for (rule, codes) in rule_to_codes {
let rule_name = rule.segments.last().unwrap();
assert_eq!(
codes.len(),
1,
"
{} is mapped to multiple codes.
The mapping of multiple codes to one rule has been disabled due to UX concerns (it would
be confusing if violations were reported under a different code than the code you selected).
We firstly want to allow rules to be selected by their names (and report them by name),
and before we can do that we have to rename all our rules to match our naming convention
(see CONTRIBUTING.md) because after that change every rule rename will be a breaking change.
See also https://github.com/astral-sh/ruff/issues/2186.
",
rule_name.ident
);
let Rule {
linter,
code,
group,
attrs,
..
} = codes
.iter()
.sorted_by_key(|data| data.linter == "Pylint")
.next()
.unwrap();
rule_noqa_code_match_arms.extend(quote! {
#(#attrs)* Rule::#rule_name => NoqaCode(crate::registry::Linter::#linter.common_prefix(), #code),
});
rule_group_match_arms.extend(quote! {
#(#attrs)* Rule::#rule_name => #group,
});
}
let rule_to_code = quote! {
impl Rule {
pub fn noqa_code(&self) -> NoqaCode {
use crate::registry::RuleNamespace;
match self {
#rule_noqa_code_match_arms
}
}
pub fn group(&self) -> RuleGroup {
use crate::registry::RuleNamespace;
match self {
#rule_group_match_arms
}
}
pub fn is_preview(&self) -> bool {
matches!(self.group(), RuleGroup::Preview)
}
pub(crate) fn is_stable(&self) -> bool {
matches!(self.group(), RuleGroup::Stable)
}
pub fn is_deprecated(&self) -> bool {
matches!(self.group(), RuleGroup::Deprecated)
}
pub fn is_removed(&self) -> bool {
matches!(self.group(), RuleGroup::Removed)
}
}
impl Linter {
pub fn code_for_rule(&self, rule: Rule) -> Option<&'static str> {
match (self, rule) {
#linter_code_for_rule_match_arms
_ => None,
}
}
}
};
rule_to_code
}
/// Implement `impl IntoIterator for &Linter` and `RuleCodePrefix::iter()`
fn generate_iter_impl(
linter_to_rules: &BTreeMap<Ident, BTreeMap<String, Rule>>,
linter_idents: &[&Ident],
) -> TokenStream {
let mut linter_rules_match_arms = quote!();
let mut linter_all_rules_match_arms = quote!();
for (linter, map) in linter_to_rules {
let rule_paths = map.values().map(|Rule { attrs, path, .. }| {
let rule_name = path.segments.last().unwrap();
quote!(#(#attrs)* Rule::#rule_name)
});
linter_rules_match_arms.extend(quote! {
Linter::#linter => vec![#(#rule_paths,)*].into_iter(),
});
let rule_paths = map.values().map(|Rule { attrs, path, .. }| {
let rule_name = path.segments.last().unwrap();
quote!(#(#attrs)* Rule::#rule_name)
});
linter_all_rules_match_arms.extend(quote! {
Linter::#linter => vec![#(#rule_paths,)*].into_iter(),
});
}
quote! {
impl Linter {
/// Rules not in the preview.
pub(crate) fn rules(self: &Linter) -> ::std::vec::IntoIter<Rule> {
match self {
#linter_rules_match_arms
}
}
/// All rules, including those in the preview.
pub fn all_rules(self: &Linter) -> ::std::vec::IntoIter<Rule> {
match self {
#linter_all_rules_match_arms
}
}
}
impl RuleCodePrefix {
pub(crate) fn iter() -> impl Iterator<Item = RuleCodePrefix> {
use strum::IntoEnumIterator;
let mut prefixes = Vec::new();
#(prefixes.extend(#linter_idents::iter().map(|x| Self::#linter_idents(x)));)*
prefixes.into_iter()
}
}
}
}
/// Generate the `Rule` enum
fn register_rules<'a>(input: impl Iterator<Item = &'a Rule>) -> TokenStream {
let mut rule_variants = quote!();
let mut rule_message_formats_match_arms = quote!();
let mut rule_fixable_match_arms = quote!();
let mut rule_explanation_match_arms = quote!();
for Rule {
name, attrs, path, ..
} in input
{
rule_variants.extend(quote! {
#(#attrs)*
#name,
});
// Apply the `attrs` to each arm, like `[cfg(feature = "foo")]`.
rule_message_formats_match_arms.extend(
quote! {#(#attrs)* Self::#name => <#path as crate::Violation>::message_formats(),},
);
rule_fixable_match_arms.extend(
quote! {#(#attrs)* Self::#name => <#path as crate::Violation>::FIX_AVAILABILITY,},
);
rule_explanation_match_arms.extend(quote! {#(#attrs)* Self::#name => #path::explain(),});
}
quote! {
use crate::Violation;
#[derive(
EnumIter,
Debug,
PartialEq,
Eq,
Copy,
Clone,
Hash,
::strum_macros::IntoStaticStr,
)]
#[repr(u16)]
#[strum(serialize_all = "kebab-case")]
pub enum Rule { #rule_variants }
impl Rule {
/// Returns the format strings used to report violations of this rule.
pub fn message_formats(&self) -> &'static [&'static str] {
match self { #rule_message_formats_match_arms }
}
/// Returns the documentation for this rule.
pub fn explanation(&self) -> Option<&'static str> {
use crate::ViolationMetadata;
match self { #rule_explanation_match_arms }
}
/// Returns the fix status of this rule.
pub const fn fixable(&self) -> crate::FixAvailability {
match self { #rule_fixable_match_arms }
}
}
}
}
impl Parse for Rule {
/// Parses a match arm such as `(Pycodestyle, "E112") => (RuleGroup::Preview, rules::pycodestyle::rules::logical_lines::NoIndentedBlock),`
fn parse(input: syn::parse::ParseStream) -> syn::Result<Self> {
let attrs = Attribute::parse_outer(input)?;
let pat_tuple;
parenthesized!(pat_tuple in input);
let linter: Ident = pat_tuple.parse()?;
let _: Token!(,) = pat_tuple.parse()?;
let code: LitStr = pat_tuple.parse()?;
let _: Token!(=>) = input.parse()?;
let pat_tuple;
parenthesized!(pat_tuple in input);
let group: Path = pat_tuple.parse()?;
let _: Token!(,) = pat_tuple.parse()?;
let rule_path: Path = pat_tuple.parse()?;
let _: Token!(,) = input.parse()?;
let rule_name = rule_path.segments.last().unwrap().ident.clone();
Ok(Rule {
name: rule_name,
linter,
code,
group,
path: rule_path,
attrs,
})
}
}