ruff/crates/ruff_graph at 657d26ff203f3313d56a0740e449997df7102cf9 - mirrors/ruff

mirror of https://github.com/astral-sh/ruff.git synced 2025-10-09 01:50:26 +00:00

History

Micha Reiser 881375a8d9 [red-knot] Lint registry and rule selection (#14874 ) ## Summary This is the third and last PR in this stack that adds support for toggling lints at a per-rule level. This PR introduces a new `LintRegistry`, a central index of known lints. The registry is required because we want to support lint rules from many different crates but need a way to look them up by name, e.g., when resolving a lint from a name in the configuration or analyzing a suppression comment. Adding a lint now requires two steps: 1. Declare the lint with `declare_lint` 2. Register the lint in the registry inside the `register_lints` function. I considered some more involved macros to avoid changes in two places. Still, I ultimately decided against it because a) it's just two places and b) I'd expect that registering a type checker lint will differ from registering a lint that runs as a rule in the linter. I worry that any more opinionated design could limit our options when working on the linter, so I kept it simple. The second part of this PR is the `RuleSelection`. It stores which lints are enabled and what severity they should use for created diagnostics. For now, the `RuleSelection` always gets initialized with all known lints and it uses their default level. ## Linter crates Each crate that defines lints should export a `register_lints` function that accepts a `&mut LintRegistryBuilder` to register all its known lints in the registry. This should make registering all known lints in a top-level crate easy: Just call `register_lints` of every crate that defines lint rules. I considered defining a `LintCollection` trait and even some fancy macros to accomplish the same but decided to go for this very simplistic approach for now. We can add more abstraction once needed. ## Lint rules This is a bit hand-wavy. I don't have a good sense for how our linter infrastructure will look like, but I expect we'll need a way to register the rules that should run as part of the red knot linter. One way is to keep doing what Ruff does by having one massive `checker` and each lint rule adds a call to itself in the relevant AST visitor methods. An alternative is that we have a `LintRule` trait that provides common hooks and implementations will be called at the "right time". Such a design would need a way to register all known lint implementations, possibly with the lint. This is where we'd probably want a dedicated `register_rule` method. A third option is that lint rules are handled separately from the `LintRegistry` and are specific to the linter crate. The current design should be flexible enough to support the three options. ## Documentation generation The documentation for all known lints can be generated by creating a factory, registering all lints by calling the `register_lints` methods, and then querying the registry for the metadata. ## Deserialization and Schema generation I haven't fully decided what the best approach is when it comes to deserializing lint rule names: * Reject invalid names in the deserializer. This gives us error messages with line and column numbers (by serde) * Don't validate lint rule names during deserialization; defer the validation until the configuration is resolved. This gives us more control over handling the error, e.g. emit a warning diagnostic instead of aborting when a rule isn't known. One technical challenge for both deserialization and schema generation is that the `Deserialize` and `JSONSchema` traits do not allow passing the `LintRegistry`, which is required to look up the lints by name. I suggest that we either rely on the salsa db being set for the current thread (`salsa::Attach`) or build our own thread-local storage for the `LintRegistry`. It's the caller's responsibility to make the lint registry available before calling `Deserialize` or `JSONSchema`. ## CLI support I prefer deferring adding support for enabling and disabling lints from the CLI for now because I think it will be easier to add once I've figured out how to handle configurations. ## Bitset optimization Ruff tracks the enabled rules using a cheap copyable `Bitset` instead of a hash map. This helped improve performance by a few percent (see https://github.com/astral-sh/ruff/pull/3606). However, this approach is no longer possible because lints have no "cheap" way to compute their index inside the registry (other than using a hash map). We could consider doing something similar to Salsa where each `LintMetadata` stores a `LazyLintIndex`. ``` pub struct LazyLintIndex { cached: OnceLock<(Nonce, LintIndex)> } impl LazyLintIndex { pub fn get(registry: &LintRegistry, lint: &'static LintMetadata) { let (nonce, index) = self.cached.get_or_init(\|\| registry.lint_index(lint)); if registry.nonce() == nonce { index } else { registry.lint_index(lint) } } ``` Each registry keeps a map from `LintId` to `LintIndex` where `LintIndex` is in the range of `0...registry.len()`. The `LazyLintIndex` is based on the assumption that every program has exactly one registry. This assumption allows to cache the `LintIndex` directly on the `LintMetadata`. The implementation falls back to the "slow" path if there is more than one registry at runtime. I was very close to implementing this optimization because it's kind of fun to implement. I ultimately decided against it because it adds complexity and I don't think it's worth doing in Red Knot today: * Red Knot only queries the rule selection when deciding whether or not to emit a diagnostic. It is rarely used to detect if a certain code block should run. This is different from Ruff where the rule selection is queried many times for every single AST node to determine which rules should run. * I'm not sure if a 2-3% performance improvement is worth the complexity I suggest revisiting this decision when working on the linter where a fast path for deciding if a rule is enabled might be more important (but that depends on how lint rules are implemented) ## Test Plan I removed a lint from the default rule registry, and the MD tests started failing because the diagnostics were no longer emitted.	2024-12-11 13:25:19 +01:00
..
src	[red-knot] Lint registry and rule selection (#14874 )	2024-12-11 13:25:19 +01:00
Cargo.toml	Bump MSRV to Rust 1.80 (#13826 )	2024-10-20 10:55:36 +02:00

[red-knot] Lint registry and rule selection (#14874 )

## Summary

This is the third and last PR in this stack that adds support for
toggling lints at a per-rule level.

This PR introduces a new `LintRegistry`, a central index of known lints.
The registry is required because we want to support lint rules from many
different crates but need a way to look them up by name, e.g., when
resolving a lint from a name in the configuration or analyzing a
suppression comment.

Adding a lint now requires two steps:

1. Declare the lint with `declare_lint`
2. Register the lint in the registry inside the `register_lints`
function.

I considered some more involved macros to avoid changes in two places.
Still, I ultimately decided against it because a) it's just two places
and b) I'd expect that registering a type checker lint will differ from
registering a lint that runs as a rule in the linter. I worry that any
more opinionated design could limit our options when working on the
linter, so I kept it simple.

The second part of this PR is the `RuleSelection`. It stores which lints
are enabled and what severity they should use for created diagnostics.
For now, the `RuleSelection` always gets initialized with all known
lints and it uses their default level.

## Linter crates

Each crate that defines lints should export a `register_lints` function
that accepts a `&mut LintRegistryBuilder` to register all its known
lints in the registry. This should make registering all known lints in a
top-level crate easy: Just call `register_lints` of every crate that
defines lint rules.

I considered defining a `LintCollection` trait and even some fancy
macros to accomplish the same but decided to go for this very simplistic
approach for now. We can add more abstraction once needed.

## Lint rules

This is a bit hand-wavy. I don't have a good sense for how our linter
infrastructure will look like, but I expect we'll need a way to register
the rules that should run as part of the red knot linter. One way is to
keep doing what Ruff does by having one massive `checker` and each lint
rule adds a call to itself in the relevant AST visitor methods. An
alternative is that we have a `LintRule` trait that provides common
hooks and implementations will be called at the "right time". Such a
design would need a way to register all known lint implementations,
possibly with the lint. This is where we'd probably want a dedicated
`register_rule` method. A third option is that lint rules are handled
separately from the `LintRegistry` and are specific to the linter crate.

The current design should be flexible enough to support the three
options.


## Documentation generation

The documentation for all known lints can be generated by creating a
factory, registering all lints by calling the `register_lints` methods,
and then querying the registry for the metadata.

## Deserialization and Schema generation

I haven't fully decided what the best approach is when it comes to
deserializing lint rule names:

* Reject invalid names in the deserializer. This gives us error messages
with line and column numbers (by serde)
* Don't validate lint rule names during deserialization; defer the
validation until the configuration is resolved. This gives us more
control over handling the error, e.g. emit a warning diagnostic instead
of aborting when a rule isn't known.

One technical challenge for both deserialization and schema generation
is that the `Deserialize` and `JSONSchema` traits do not allow passing
the `LintRegistry`, which is required to look up the lints by name. I
suggest that we either rely on the salsa db being set for the current
thread (`salsa::Attach`) or build our own thread-local storage for the
`LintRegistry`. It's the caller's responsibility to make the lint
registry available before calling `Deserialize` or `JSONSchema`.


## CLI support

I prefer deferring adding support for enabling and disabling lints from
the CLI for now because I think it will be easier
to add once I've figured out how to handle configurations. 

## Bitset optimization

Ruff tracks the enabled rules using a cheap copyable `Bitset` instead of
a hash map. This helped improve performance by a few percent (see
https://github.com/astral-sh/ruff/pull/3606). However, this approach is
no longer possible because lints have no "cheap" way to compute their
index inside the registry (other than using a hash map).

We could consider doing something similar to Salsa where each
`LintMetadata` stores a `LazyLintIndex`.

```
pub struct LazyLintIndex {
	cached: OnceLock<(Nonce, LintIndex)>
}

impl LazyLintIndex {
	pub fn get(registry: &LintRegistry, lint: &'static LintMetadata) {
	
	let (nonce, index) = self.cached.get_or_init(|| registry.lint_index(lint));

	if registry.nonce() == nonce {
		index
	} else {
		registry.lint_index(lint)
	}
}
```

Each registry keeps a map from `LintId` to `LintIndex` where `LintIndex`
is in the range of `0...registry.len()`. The `LazyLintIndex` is based on
the assumption that every program has exactly **one** registry. This
assumption allows to cache the `LintIndex` directly on the
`LintMetadata`. The implementation falls back to the "slow" path if
there is more than one registry at runtime.

I was very close to implementing this optimization because it's kind of
fun to implement. I ultimately decided against it because it adds
complexity and I don't think it's worth doing in Red Knot today:

* Red Knot only queries the rule selection when deciding whether or not
to emit a diagnostic. It is rarely used to detect if a certain code
block should run. This is different from Ruff where the rule selection
is queried many times for every single AST node to determine which rules
*should* run.
* I'm not sure if a 2-3% performance improvement is worth the complexity

I suggest revisiting this decision when working on the linter where a
fast path for deciding if a rule is enabled might be more important (but
that depends on how lint rules are implemented)


## Test Plan

I removed a lint from the default rule registry, and the MD tests
started failing because the diagnostics were no longer emitted.

2024-12-11 13:25:19 +01:00

src

[red-knot] Lint registry and rule selection (#14874 )

2024-12-11 13:25:19 +01:00

Cargo.toml

Bump MSRV to Rust 1.80 (#13826 )

2024-10-20 10:55:36 +02:00