[red-knot] Lint registry and rule selection (#14874)

## Summary

This is the third and last PR in this stack that adds support for
toggling lints at a per-rule level.

This PR introduces a new `LintRegistry`, a central index of known lints.
The registry is required because we want to support lint rules from many
different crates but need a way to look them up by name, e.g., when
resolving a lint from a name in the configuration or analyzing a
suppression comment.

Adding a lint now requires two steps:

1. Declare the lint with `declare_lint`
2. Register the lint in the registry inside the `register_lints`
function.

I considered some more involved macros to avoid changes in two places.
Still, I ultimately decided against it because a) it's just two places
and b) I'd expect that registering a type checker lint will differ from
registering a lint that runs as a rule in the linter. I worry that any
more opinionated design could limit our options when working on the
linter, so I kept it simple.

The second part of this PR is the `RuleSelection`. It stores which lints
are enabled and what severity they should use for created diagnostics.
For now, the `RuleSelection` always gets initialized with all known
lints and it uses their default level.

## Linter crates

Each crate that defines lints should export a `register_lints` function
that accepts a `&mut LintRegistryBuilder` to register all its known
lints in the registry. This should make registering all known lints in a
top-level crate easy: Just call `register_lints` of every crate that
defines lint rules.

I considered defining a `LintCollection` trait and even some fancy
macros to accomplish the same but decided to go for this very simplistic
approach for now. We can add more abstraction once needed.

## Lint rules

This is a bit hand-wavy. I don't have a good sense for how our linter
infrastructure will look like, but I expect we'll need a way to register
the rules that should run as part of the red knot linter. One way is to
keep doing what Ruff does by having one massive `checker` and each lint
rule adds a call to itself in the relevant AST visitor methods. An
alternative is that we have a `LintRule` trait that provides common
hooks and implementations will be called at the "right time". Such a
design would need a way to register all known lint implementations,
possibly with the lint. This is where we'd probably want a dedicated
`register_rule` method. A third option is that lint rules are handled
separately from the `LintRegistry` and are specific to the linter crate.

The current design should be flexible enough to support the three
options.


## Documentation generation

The documentation for all known lints can be generated by creating a
factory, registering all lints by calling the `register_lints` methods,
and then querying the registry for the metadata.

## Deserialization and Schema generation

I haven't fully decided what the best approach is when it comes to
deserializing lint rule names:

* Reject invalid names in the deserializer. This gives us error messages
with line and column numbers (by serde)
* Don't validate lint rule names during deserialization; defer the
validation until the configuration is resolved. This gives us more
control over handling the error, e.g. emit a warning diagnostic instead
of aborting when a rule isn't known.

One technical challenge for both deserialization and schema generation
is that the `Deserialize` and `JSONSchema` traits do not allow passing
the `LintRegistry`, which is required to look up the lints by name. I
suggest that we either rely on the salsa db being set for the current
thread (`salsa::Attach`) or build our own thread-local storage for the
`LintRegistry`. It's the caller's responsibility to make the lint
registry available before calling `Deserialize` or `JSONSchema`.


## CLI support

I prefer deferring adding support for enabling and disabling lints from
the CLI for now because I think it will be easier
to add once I've figured out how to handle configurations. 

## Bitset optimization

Ruff tracks the enabled rules using a cheap copyable `Bitset` instead of
a hash map. This helped improve performance by a few percent (see
https://github.com/astral-sh/ruff/pull/3606). However, this approach is
no longer possible because lints have no "cheap" way to compute their
index inside the registry (other than using a hash map).

We could consider doing something similar to Salsa where each
`LintMetadata` stores a `LazyLintIndex`.

```
pub struct LazyLintIndex {
	cached: OnceLock<(Nonce, LintIndex)>
}

impl LazyLintIndex {
	pub fn get(registry: &LintRegistry, lint: &'static LintMetadata) {
	
	let (nonce, index) = self.cached.get_or_init(|| registry.lint_index(lint));

	if registry.nonce() == nonce {
		index
	} else {
		registry.lint_index(lint)
	}
}
```

Each registry keeps a map from `LintId` to `LintIndex` where `LintIndex`
is in the range of `0...registry.len()`. The `LazyLintIndex` is based on
the assumption that every program has exactly **one** registry. This
assumption allows to cache the `LintIndex` directly on the
`LintMetadata`. The implementation falls back to the "slow" path if
there is more than one registry at runtime.

I was very close to implementing this optimization because it's kind of
fun to implement. I ultimately decided against it because it adds
complexity and I don't think it's worth doing in Red Knot today:

* Red Knot only queries the rule selection when deciding whether or not
to emit a diagnostic. It is rarely used to detect if a certain code
block should run. This is different from Ruff where the rule selection
is queried many times for every single AST node to determine which rules
*should* run.
* I'm not sure if a 2-3% performance improvement is worth the complexity

I suggest revisiting this decision when working on the linter where a
fast path for deciding if a rule is enabled might be more important (but
that depends on how lint rules are implemented)


## Test Plan

I removed a lint from the default rule registry, and the MD tests
started failing because the diagnostics were no longer emitted.
This commit is contained in:
Micha Reiser 2024-12-11 13:25:19 +01:00 committed by GitHub
parent 6f8d8fa36b
commit 881375a8d9
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
9 changed files with 371 additions and 13 deletions

View file

@ -1,5 +1,8 @@
use itertools::Itertools;
use ruff_db::diagnostic::{LintName, Severity};
use rustc_hash::FxHashMap;
use std::hash::Hasher;
use thiserror::Error;
#[derive(Debug, Clone)]
pub struct LintMetadata {
@ -224,3 +227,239 @@ macro_rules! declare_lint {
};
};
}
/// A unique identifier for a lint rule.
///
/// Implements `PartialEq`, `Eq`, and `Hash` based on the `LintMetadata` pointer
/// for fast comparison and lookup.
#[derive(Debug, Clone, Copy)]
pub struct LintId {
definition: &'static LintMetadata,
}
impl LintId {
pub const fn of(definition: &'static LintMetadata) -> Self {
LintId { definition }
}
}
impl PartialEq for LintId {
fn eq(&self, other: &Self) -> bool {
std::ptr::eq(self.definition, other.definition)
}
}
impl Eq for LintId {}
impl std::hash::Hash for LintId {
fn hash<H: Hasher>(&self, state: &mut H) {
std::ptr::hash(self.definition, state);
}
}
impl std::ops::Deref for LintId {
type Target = LintMetadata;
fn deref(&self) -> &Self::Target {
self.definition
}
}
#[derive(Default, Debug)]
pub struct LintRegistryBuilder {
/// Registered lints that haven't been removed.
lints: Vec<LintId>,
/// Lints indexed by name, including aliases and removed rules.
by_name: FxHashMap<&'static str, LintEntry>,
}
impl LintRegistryBuilder {
#[track_caller]
pub fn register_lint(&mut self, lint: &'static LintMetadata) {
assert_eq!(
self.by_name.insert(&*lint.name, lint.into()),
None,
"duplicate lint registration for '{name}'",
name = lint.name
);
if !lint.status.is_removed() {
self.lints.push(LintId::of(lint));
}
}
#[track_caller]
pub fn register_alias(&mut self, from: LintName, to: &'static LintMetadata) {
let target = match self.by_name.get(to.name.as_str()) {
Some(LintEntry::Lint(target) | LintEntry::Removed(target)) => target,
Some(LintEntry::Alias(target)) => {
panic!(
"lint alias {from} -> {to:?} points to another alias {target:?}",
target = target.name()
)
}
None => panic!(
"lint alias {from} -> {to} points to non-registered lint",
to = to.name
),
};
assert_eq!(
self.by_name
.insert(from.as_str(), LintEntry::Alias(*target)),
None,
"duplicate lint registration for '{from}'",
);
}
pub fn build(self) -> LintRegistry {
LintRegistry {
lints: self.lints,
by_name: self.by_name,
}
}
}
#[derive(Default, Debug)]
pub struct LintRegistry {
lints: Vec<LintId>,
by_name: FxHashMap<&'static str, LintEntry>,
}
impl LintRegistry {
/// Looks up a lint by its name.
pub fn get(&self, code: &str) -> Result<LintId, GetLintError> {
match self.by_name.get(code) {
Some(LintEntry::Lint(metadata)) => Ok(*metadata),
Some(LintEntry::Alias(lint)) => {
if lint.status.is_removed() {
Err(GetLintError::Removed(lint.name()))
} else {
Ok(*lint)
}
}
Some(LintEntry::Removed(lint)) => Err(GetLintError::Removed(lint.name())),
None => Err(GetLintError::Unknown(code.to_string())),
}
}
/// Returns all registered, non-removed lints.
pub fn lints(&self) -> &[LintId] {
&self.lints
}
/// Returns an iterator over all known aliases and to their target lints.
///
/// This iterator includes aliases that point to removed lints.
pub fn aliases(&self) -> impl Iterator<Item = (LintName, LintId)> + '_ {
self.by_name.iter().filter_map(|(key, value)| {
if let LintEntry::Alias(alias) = value {
Some((LintName::of(key), *alias))
} else {
None
}
})
}
/// Iterates over all removed lints.
pub fn removed(&self) -> impl Iterator<Item = LintId> + '_ {
self.by_name.iter().filter_map(|(_, value)| {
if let LintEntry::Removed(metadata) = value {
Some(*metadata)
} else {
None
}
})
}
}
#[derive(Error, Debug, Clone)]
pub enum GetLintError {
/// The name maps to this removed lint.
#[error("lint {0} has been removed")]
Removed(LintName),
/// No lint with the given name is known.
#[error("unknown lint {0}")]
Unknown(String),
}
#[derive(Debug, PartialEq, Eq)]
pub enum LintEntry {
/// An existing lint rule. Can be in preview, stable or deprecated.
Lint(LintId),
/// A lint rule that has been removed.
Removed(LintId),
Alias(LintId),
}
impl From<&'static LintMetadata> for LintEntry {
fn from(metadata: &'static LintMetadata) -> Self {
if metadata.status.is_removed() {
LintEntry::Removed(LintId::of(metadata))
} else {
LintEntry::Lint(LintId::of(metadata))
}
}
}
#[derive(Debug, Clone, Default)]
pub struct RuleSelection {
/// Map with the severity for each enabled lint rule.
///
/// If a rule isn't present in this map, then it should be considered disabled.
lints: FxHashMap<LintId, Severity>,
}
impl RuleSelection {
/// Creates a new rule selection from all known lints in the registry that are enabled
/// according to their default severity.
pub fn from_registry(registry: &LintRegistry) -> Self {
let lints = registry
.lints()
.iter()
.filter_map(|lint| {
Severity::try_from(lint.default_level())
.ok()
.map(|severity| (*lint, severity))
})
.collect();
RuleSelection { lints }
}
/// Returns an iterator over all enabled lints.
pub fn enabled(&self) -> impl Iterator<Item = LintId> + '_ {
self.lints.keys().copied()
}
/// Returns an iterator over all enabled lints and their severity.
pub fn iter(&self) -> impl ExactSizeIterator<Item = (LintId, Severity)> + '_ {
self.lints.iter().map(|(&lint, &severity)| (lint, severity))
}
/// Returns the configured severity for the lint with the given id or `None` if the lint is disabled.
pub fn severity(&self, lint: LintId) -> Option<Severity> {
self.lints.get(&lint).copied()
}
/// Enables `lint` and configures with the given `severity`.
///
/// Overrides any previous configuration for the lint.
pub fn enable(&mut self, lint: LintId, severity: Severity) {
self.lints.insert(lint, severity);
}
/// Disables `lint` if it was previously enabled.
pub fn disable(&mut self, lint: LintId) {
self.lints.remove(&lint);
}
/// Merges the enabled lints from `other` into this selection.
///
/// Lints from `other` will override any existing configuration.
pub fn merge(&mut self, other: &RuleSelection) {
self.lints.extend(other.iter());
}
}