mirror of
https://github.com/astral-sh/ruff.git
synced 2025-09-29 21:35:58 +00:00
[red-knot] Lint registry and rule selection (#14874)
## Summary This is the third and last PR in this stack that adds support for toggling lints at a per-rule level. This PR introduces a new `LintRegistry`, a central index of known lints. The registry is required because we want to support lint rules from many different crates but need a way to look them up by name, e.g., when resolving a lint from a name in the configuration or analyzing a suppression comment. Adding a lint now requires two steps: 1. Declare the lint with `declare_lint` 2. Register the lint in the registry inside the `register_lints` function. I considered some more involved macros to avoid changes in two places. Still, I ultimately decided against it because a) it's just two places and b) I'd expect that registering a type checker lint will differ from registering a lint that runs as a rule in the linter. I worry that any more opinionated design could limit our options when working on the linter, so I kept it simple. The second part of this PR is the `RuleSelection`. It stores which lints are enabled and what severity they should use for created diagnostics. For now, the `RuleSelection` always gets initialized with all known lints and it uses their default level. ## Linter crates Each crate that defines lints should export a `register_lints` function that accepts a `&mut LintRegistryBuilder` to register all its known lints in the registry. This should make registering all known lints in a top-level crate easy: Just call `register_lints` of every crate that defines lint rules. I considered defining a `LintCollection` trait and even some fancy macros to accomplish the same but decided to go for this very simplistic approach for now. We can add more abstraction once needed. ## Lint rules This is a bit hand-wavy. I don't have a good sense for how our linter infrastructure will look like, but I expect we'll need a way to register the rules that should run as part of the red knot linter. One way is to keep doing what Ruff does by having one massive `checker` and each lint rule adds a call to itself in the relevant AST visitor methods. An alternative is that we have a `LintRule` trait that provides common hooks and implementations will be called at the "right time". Such a design would need a way to register all known lint implementations, possibly with the lint. This is where we'd probably want a dedicated `register_rule` method. A third option is that lint rules are handled separately from the `LintRegistry` and are specific to the linter crate. The current design should be flexible enough to support the three options. ## Documentation generation The documentation for all known lints can be generated by creating a factory, registering all lints by calling the `register_lints` methods, and then querying the registry for the metadata. ## Deserialization and Schema generation I haven't fully decided what the best approach is when it comes to deserializing lint rule names: * Reject invalid names in the deserializer. This gives us error messages with line and column numbers (by serde) * Don't validate lint rule names during deserialization; defer the validation until the configuration is resolved. This gives us more control over handling the error, e.g. emit a warning diagnostic instead of aborting when a rule isn't known. One technical challenge for both deserialization and schema generation is that the `Deserialize` and `JSONSchema` traits do not allow passing the `LintRegistry`, which is required to look up the lints by name. I suggest that we either rely on the salsa db being set for the current thread (`salsa::Attach`) or build our own thread-local storage for the `LintRegistry`. It's the caller's responsibility to make the lint registry available before calling `Deserialize` or `JSONSchema`. ## CLI support I prefer deferring adding support for enabling and disabling lints from the CLI for now because I think it will be easier to add once I've figured out how to handle configurations. ## Bitset optimization Ruff tracks the enabled rules using a cheap copyable `Bitset` instead of a hash map. This helped improve performance by a few percent (see https://github.com/astral-sh/ruff/pull/3606). However, this approach is no longer possible because lints have no "cheap" way to compute their index inside the registry (other than using a hash map). We could consider doing something similar to Salsa where each `LintMetadata` stores a `LazyLintIndex`. ``` pub struct LazyLintIndex { cached: OnceLock<(Nonce, LintIndex)> } impl LazyLintIndex { pub fn get(registry: &LintRegistry, lint: &'static LintMetadata) { let (nonce, index) = self.cached.get_or_init(|| registry.lint_index(lint)); if registry.nonce() == nonce { index } else { registry.lint_index(lint) } } ``` Each registry keeps a map from `LintId` to `LintIndex` where `LintIndex` is in the range of `0...registry.len()`. The `LazyLintIndex` is based on the assumption that every program has exactly **one** registry. This assumption allows to cache the `LintIndex` directly on the `LintMetadata`. The implementation falls back to the "slow" path if there is more than one registry at runtime. I was very close to implementing this optimization because it's kind of fun to implement. I ultimately decided against it because it adds complexity and I don't think it's worth doing in Red Knot today: * Red Knot only queries the rule selection when deciding whether or not to emit a diagnostic. It is rarely used to detect if a certain code block should run. This is different from Ruff where the rule selection is queried many times for every single AST node to determine which rules *should* run. * I'm not sure if a 2-3% performance improvement is worth the complexity I suggest revisiting this decision when working on the linter where a fast path for deciding if a rule is enabled might be more important (but that depends on how lint rules are implemented) ## Test Plan I removed a lint from the default rule registry, and the MD tests started failing because the diagnostics were no longer emitted.
This commit is contained in:
parent
6f8d8fa36b
commit
881375a8d9
9 changed files with 371 additions and 13 deletions
|
@ -1,5 +1,8 @@
|
|||
use itertools::Itertools;
|
||||
use ruff_db::diagnostic::{LintName, Severity};
|
||||
use rustc_hash::FxHashMap;
|
||||
use std::hash::Hasher;
|
||||
use thiserror::Error;
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct LintMetadata {
|
||||
|
@ -224,3 +227,239 @@ macro_rules! declare_lint {
|
|||
};
|
||||
};
|
||||
}
|
||||
|
||||
/// A unique identifier for a lint rule.
|
||||
///
|
||||
/// Implements `PartialEq`, `Eq`, and `Hash` based on the `LintMetadata` pointer
|
||||
/// for fast comparison and lookup.
|
||||
#[derive(Debug, Clone, Copy)]
|
||||
pub struct LintId {
|
||||
definition: &'static LintMetadata,
|
||||
}
|
||||
|
||||
impl LintId {
|
||||
pub const fn of(definition: &'static LintMetadata) -> Self {
|
||||
LintId { definition }
|
||||
}
|
||||
}
|
||||
|
||||
impl PartialEq for LintId {
|
||||
fn eq(&self, other: &Self) -> bool {
|
||||
std::ptr::eq(self.definition, other.definition)
|
||||
}
|
||||
}
|
||||
|
||||
impl Eq for LintId {}
|
||||
|
||||
impl std::hash::Hash for LintId {
|
||||
fn hash<H: Hasher>(&self, state: &mut H) {
|
||||
std::ptr::hash(self.definition, state);
|
||||
}
|
||||
}
|
||||
|
||||
impl std::ops::Deref for LintId {
|
||||
type Target = LintMetadata;
|
||||
|
||||
fn deref(&self) -> &Self::Target {
|
||||
self.definition
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Default, Debug)]
|
||||
pub struct LintRegistryBuilder {
|
||||
/// Registered lints that haven't been removed.
|
||||
lints: Vec<LintId>,
|
||||
|
||||
/// Lints indexed by name, including aliases and removed rules.
|
||||
by_name: FxHashMap<&'static str, LintEntry>,
|
||||
}
|
||||
|
||||
impl LintRegistryBuilder {
|
||||
#[track_caller]
|
||||
pub fn register_lint(&mut self, lint: &'static LintMetadata) {
|
||||
assert_eq!(
|
||||
self.by_name.insert(&*lint.name, lint.into()),
|
||||
None,
|
||||
"duplicate lint registration for '{name}'",
|
||||
name = lint.name
|
||||
);
|
||||
|
||||
if !lint.status.is_removed() {
|
||||
self.lints.push(LintId::of(lint));
|
||||
}
|
||||
}
|
||||
|
||||
#[track_caller]
|
||||
pub fn register_alias(&mut self, from: LintName, to: &'static LintMetadata) {
|
||||
let target = match self.by_name.get(to.name.as_str()) {
|
||||
Some(LintEntry::Lint(target) | LintEntry::Removed(target)) => target,
|
||||
Some(LintEntry::Alias(target)) => {
|
||||
panic!(
|
||||
"lint alias {from} -> {to:?} points to another alias {target:?}",
|
||||
target = target.name()
|
||||
)
|
||||
}
|
||||
None => panic!(
|
||||
"lint alias {from} -> {to} points to non-registered lint",
|
||||
to = to.name
|
||||
),
|
||||
};
|
||||
|
||||
assert_eq!(
|
||||
self.by_name
|
||||
.insert(from.as_str(), LintEntry::Alias(*target)),
|
||||
None,
|
||||
"duplicate lint registration for '{from}'",
|
||||
);
|
||||
}
|
||||
|
||||
pub fn build(self) -> LintRegistry {
|
||||
LintRegistry {
|
||||
lints: self.lints,
|
||||
by_name: self.by_name,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Default, Debug)]
|
||||
pub struct LintRegistry {
|
||||
lints: Vec<LintId>,
|
||||
by_name: FxHashMap<&'static str, LintEntry>,
|
||||
}
|
||||
|
||||
impl LintRegistry {
|
||||
/// Looks up a lint by its name.
|
||||
pub fn get(&self, code: &str) -> Result<LintId, GetLintError> {
|
||||
match self.by_name.get(code) {
|
||||
Some(LintEntry::Lint(metadata)) => Ok(*metadata),
|
||||
Some(LintEntry::Alias(lint)) => {
|
||||
if lint.status.is_removed() {
|
||||
Err(GetLintError::Removed(lint.name()))
|
||||
} else {
|
||||
Ok(*lint)
|
||||
}
|
||||
}
|
||||
Some(LintEntry::Removed(lint)) => Err(GetLintError::Removed(lint.name())),
|
||||
None => Err(GetLintError::Unknown(code.to_string())),
|
||||
}
|
||||
}
|
||||
|
||||
/// Returns all registered, non-removed lints.
|
||||
pub fn lints(&self) -> &[LintId] {
|
||||
&self.lints
|
||||
}
|
||||
|
||||
/// Returns an iterator over all known aliases and to their target lints.
|
||||
///
|
||||
/// This iterator includes aliases that point to removed lints.
|
||||
pub fn aliases(&self) -> impl Iterator<Item = (LintName, LintId)> + '_ {
|
||||
self.by_name.iter().filter_map(|(key, value)| {
|
||||
if let LintEntry::Alias(alias) = value {
|
||||
Some((LintName::of(key), *alias))
|
||||
} else {
|
||||
None
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
/// Iterates over all removed lints.
|
||||
pub fn removed(&self) -> impl Iterator<Item = LintId> + '_ {
|
||||
self.by_name.iter().filter_map(|(_, value)| {
|
||||
if let LintEntry::Removed(metadata) = value {
|
||||
Some(*metadata)
|
||||
} else {
|
||||
None
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Error, Debug, Clone)]
|
||||
pub enum GetLintError {
|
||||
/// The name maps to this removed lint.
|
||||
#[error("lint {0} has been removed")]
|
||||
Removed(LintName),
|
||||
|
||||
/// No lint with the given name is known.
|
||||
#[error("unknown lint {0}")]
|
||||
Unknown(String),
|
||||
}
|
||||
|
||||
#[derive(Debug, PartialEq, Eq)]
|
||||
pub enum LintEntry {
|
||||
/// An existing lint rule. Can be in preview, stable or deprecated.
|
||||
Lint(LintId),
|
||||
/// A lint rule that has been removed.
|
||||
Removed(LintId),
|
||||
Alias(LintId),
|
||||
}
|
||||
|
||||
impl From<&'static LintMetadata> for LintEntry {
|
||||
fn from(metadata: &'static LintMetadata) -> Self {
|
||||
if metadata.status.is_removed() {
|
||||
LintEntry::Removed(LintId::of(metadata))
|
||||
} else {
|
||||
LintEntry::Lint(LintId::of(metadata))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Default)]
|
||||
pub struct RuleSelection {
|
||||
/// Map with the severity for each enabled lint rule.
|
||||
///
|
||||
/// If a rule isn't present in this map, then it should be considered disabled.
|
||||
lints: FxHashMap<LintId, Severity>,
|
||||
}
|
||||
|
||||
impl RuleSelection {
|
||||
/// Creates a new rule selection from all known lints in the registry that are enabled
|
||||
/// according to their default severity.
|
||||
pub fn from_registry(registry: &LintRegistry) -> Self {
|
||||
let lints = registry
|
||||
.lints()
|
||||
.iter()
|
||||
.filter_map(|lint| {
|
||||
Severity::try_from(lint.default_level())
|
||||
.ok()
|
||||
.map(|severity| (*lint, severity))
|
||||
})
|
||||
.collect();
|
||||
|
||||
RuleSelection { lints }
|
||||
}
|
||||
|
||||
/// Returns an iterator over all enabled lints.
|
||||
pub fn enabled(&self) -> impl Iterator<Item = LintId> + '_ {
|
||||
self.lints.keys().copied()
|
||||
}
|
||||
|
||||
/// Returns an iterator over all enabled lints and their severity.
|
||||
pub fn iter(&self) -> impl ExactSizeIterator<Item = (LintId, Severity)> + '_ {
|
||||
self.lints.iter().map(|(&lint, &severity)| (lint, severity))
|
||||
}
|
||||
|
||||
/// Returns the configured severity for the lint with the given id or `None` if the lint is disabled.
|
||||
pub fn severity(&self, lint: LintId) -> Option<Severity> {
|
||||
self.lints.get(&lint).copied()
|
||||
}
|
||||
|
||||
/// Enables `lint` and configures with the given `severity`.
|
||||
///
|
||||
/// Overrides any previous configuration for the lint.
|
||||
pub fn enable(&mut self, lint: LintId, severity: Severity) {
|
||||
self.lints.insert(lint, severity);
|
||||
}
|
||||
|
||||
/// Disables `lint` if it was previously enabled.
|
||||
pub fn disable(&mut self, lint: LintId) {
|
||||
self.lints.remove(&lint);
|
||||
}
|
||||
|
||||
/// Merges the enabled lints from `other` into this selection.
|
||||
///
|
||||
/// Lints from `other` will override any existing configuration.
|
||||
pub fn merge(&mut self, other: &RuleSelection) {
|
||||
self.lints.extend(other.iter());
|
||||
}
|
||||
}
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue