Rebase of https://github.com/astral-sh/ruff/pull/5119 authored by
@evanrittenhouse with additional refinements.
## Changes
- Adds `--unsafe-fixes` / `--no-unsafe-fixes` flags to `ruff check`
- Violations with unsafe fixes are not shown as fixable unless opted-in
- Fix applicability is respected now
- `Applicability::Never` fixes are no longer applied
- `Applicability::Sometimes` fixes require opt-in
- `Applicability::Always` fixes are unchanged
- Hints for availability of `--unsafe-fixes` added to `ruff check`
output
## Examples
Check hints at hidden unsafe fixes
```
❯ ruff check example.py --no-cache --select F601,W292
example.py:1:14: F601 Dictionary key literal `'a'` repeated
example.py:2:15: W292 [*] No newline at end of file
Found 2 errors.
[*] 1 fixable with the `--fix` option (1 hidden fix can be enabled with the `--unsafe-fixes` option).
```
We could add an indicator for which violations have hidden fixes in the
future.
Check treats unsafe fixes as applicable with opt-in
```
❯ ruff check example.py --no-cache --select F601,W292 --unsafe-fixes
example.py:1:14: F601 [*] Dictionary key literal `'a'` repeated
example.py:2:15: W292 [*] No newline at end of file
Found 2 errors.
[*] 2 fixable with the --fix option.
```
Also can be enabled in the config file
```
❯ cat ruff.toml
unsafe-fixes = true
```
And opted-out per invocation
```
❯ ruff check example.py --no-cache --select F601,W292 --no-unsafe-fixes
example.py:1:14: F601 Dictionary key literal `'a'` repeated
example.py:2:15: W292 [*] No newline at end of file
Found 2 errors.
[*] 1 fixable with the `--fix` option (1 hidden fix can be enabled with the `--unsafe-fixes` option).
```
Diff does not include unsafe fixes
```
❯ ruff check example.py --no-cache --select F601,W292 --diff
--- example.py
+++ example.py
@@ -1,2 +1,2 @@
x = {'a': 1, 'a': 1}
-print(('foo'))
+print(('foo'))
\ No newline at end of file
Would fix 1 error.
```
Unless there is opt-in
```
❯ ruff check example.py --no-cache --select F601,W292 --diff --unsafe-fixes
--- example.py
+++ example.py
@@ -1,2 +1,2 @@
-x = {'a': 1}
-print(('foo'))
+x = {'a': 1, 'a': 1}
+print(('foo'))
\ No newline at end of file
Would fix 2 errors.
```
https://github.com/astral-sh/ruff/pull/7790 will improve the diff
messages following this pull request
Similarly, `--fix` and `--fix-only` require the `--unsafe-fixes` flag to
apply unsafe fixes.
## Related
Replaces #5119
Closes https://github.com/astral-sh/ruff/issues/4185
Closes https://github.com/astral-sh/ruff/issues/7214
Closes https://github.com/astral-sh/ruff/issues/4845
Closes https://github.com/astral-sh/ruff/issues/3863
Addresses https://github.com/astral-sh/ruff/issues/6835
Addresses https://github.com/astral-sh/ruff/issues/7019
Needs follow-up https://github.com/astral-sh/ruff/issues/6962
Needs follow-up https://github.com/astral-sh/ruff/issues/4845
Needs follow-up https://github.com/astral-sh/ruff/issues/7436
Needs follow-up https://github.com/astral-sh/ruff/issues/7025
Needs follow-up https://github.com/astral-sh/ruff/issues/6434
Follow-up #7790
Follow-up https://github.com/astral-sh/ruff/pull/7792
---------
Co-authored-by: Evan Rittenhouse <evanrittenhouse@gmail.com>
## Summary
## Stack Summary
This stack splits `Settings` into `FormatterSettings` and `LinterSettings` and moves it into `ruff_workspace`. This change is necessary to add the `FormatterSettings` to `Settings` without adding `ruff_python_formatter` as a dependency to `ruff_linter` (and the linter should not contain the formatter settings).
A quick overview of our settings struct at play:
* `Options`: 1:1 representation of the options in the `pyproject.toml` or `ruff.toml`. Used for deserialization.
* `Configuration`: Resolved `Options`, potentially merged from multiple configurations (when using `extend`). The representation is very close if not identical to the `Options`.
* `Settings`: The resolved configuration that uses a data format optimized for reading. Optional fields are initialized with their default values. Initialized by `Configuration::into_settings` .
The goal of this stack is to split `Settings` into tool-specific resolved `Settings` that are independent of each other. This comes at the advantage that the individual crates don't need to know anything about the other tools. The downside is that information gets duplicated between `Settings`. Right now the duplication is minimal (`line-length`, `tab-width`) but we may need to come up with a solution if more expensive data needs sharing.
This stack focuses on `Settings`. Splitting `Configuration` into some smaller structs is something I'll follow up on later.
## PR Summary
This PR moves the `ResolverSettings` and `Settings` struct to `ruff_workspace`. `LinterSettings` remains in `ruff_linter` because it gets passed to lint rules, the `Checker` etc.
## Test Plan
`cargo test`
## Stack Summary
This stack splits `Settings` into `FormatterSettings` and `LinterSettings` and moves it into `ruff_workspace`. This change is necessary to add the `FormatterSettings` to `Settings` without adding `ruff_python_formatter` as a dependency to `ruff_linter` (and the linter should not contain the formatter settings).
A quick overview of our settings struct at play:
* `Options`: 1:1 representation of the options in the `pyproject.toml` or `ruff.toml`. Used for deserialization.
* `Configuration`: Resolved `Options`, potentially merged from multiple configurations (when using `extend`). The representation is very close if not identical to the `Options`.
* `Settings`: The resolved configuration that uses a data format optimized for reading. Optional fields are initialized with their default values. Initialized by `Configuration::into_settings` .
The goal of this stack is to split `Settings` into tool-specific resolved `Settings` that are independent of each other. This comes at the advantage that the individual crates don't need to know anything about the other tools. The downside is that information gets duplicated between `Settings`. Right now the duplication is minimal (`line-length`, `tab-width`) but we may need to come up with a solution if more expensive data needs sharing.
This stack focuses on `Settings`. Splitting `Configuration` into some smaller structs is something I'll follow up on later.
## PR Summary
This PR extracts the linter-specific settings into a new `LinterSettings` struct and adds it as a `linter` field to the `Settings` struct. This is in preparation for moving `Settings` from `ruff_linter` to `ruff_workspace`
## Test Plan
`cargo test`
## Summary
This PR updates the `FileCache` to include an optional `NotebookIndex`
to support caching for Jupyter Notebooks.
We only require the index to compute the diagnostics and thus we don't
really need to store the entire `Notebook` on the `Diagnostics` struct.
This means we only need the index to be stored in the cache to
reconstruct the `Diagnostics`.
## Test Plan
Update an existing test case to run over the fixtures under
`ruff_notebook` crate where there are multiple Jupyter Notebook.
Locally, the following commands were run in order:
1. Remove the cache: `rm -rf .ruff_cache`
2. Run without cache: `cargo run --bin ruff -- check --isolated
crates/ruff_notebook/resources/test/fixtures/jupyter/unused_variable.ipynb
--no-cache`
3. Run with cache: `cargo run --bin ruff -- check --isolated
crates/ruff_notebook/resources/test/fixtures/jupyter/unused_variable.ipynb`
4. Check whether the `.ruff_cache` directory was created or not
5. Run with cache again and verify: `cargo run --bin ruff -- check
--isolated
crates/ruff_notebook/resources/test/fixtures/jupyter/unused_variable.ipynb`
## Benchmarks
https://github.com/astral-sh/ruff/pull/6863#issuecomment-1715675186fixes: #6671
Reimplements https://github.com/astral-sh/ruff/pull/3104
Closes https://github.com/astral-sh/ruff/issues/5726
Note that we will generate the hash for a cache key twice in normal
operation. Once to check for the cached item and again to update the
cache. We could optimize this by generating the hash once in
`diagnostics::lint_file` and passing the `u64` into `get` and `update`.
We'd probably want to wrap it in a `CacheKeyHash` enum for type safety.
## Test plan
Unit tests for Windows and Unix.
Manual test with case from issue
```
❯ touch fake.py
❯ chmod +x fake.py
❯ ./target/debug/ruff --select EXE fake.py
fake.py:1:1: EXE002 The file is executable but no shebang is present
Found 1 error.
❯ chmod -x fake.py
❯ ./target/debug/ruff --select EXE fake.py
```
## Summary
And remove cached files that we haven't seen for a certain period of
time, currently 30 days.
For the last seen timestamp we actually use an `u64`, it's smaller on
disk than `SystemTime` (which size is OS dependent) and fits in an
`AtomicU64` which we can use to update it without locks.
## Test Plan
Added a new unit test, run by `cargo test`.
## Summary
Open cache files in parallel (again), brings the performance back to be roughly equal to the old implementation.
## Test Plan
Existing tests should keep working.
## Summary
This changes the caching design from one cache file per source file, to
one cache file per package. This greatly reduces the amount of cache
files that are opened and written, while maintaining roughly the same
(combined) size as bincode is very compact.
Below are some very much not scientific performance tests. It uses
projects/sources to check:
* small.py: single, 31 bytes Python file with 2 errors.
* test.py: single, 43k Python file with 8 errors.
* fastapi: FastAPI repo, 1134 files checked, 0 errors.
Source | Before # files | After # files | Before size | After size
-------|-------|-------|-------|-------
small.py | 1 | 1 | 20 K | 20 K
test.py | 1 | 1 | 60 K | 60 K
fastapi | 1134 | 518 | 4.5 M | 2.3 M
One question that might come up is why fastapi still has 518 cache files
and not 1? That is because this is using the existing package
resolution, which sees examples, docs, etc. as separate from the "main"
source code (in the fastapi directory in the repo). In this future it
might be worth consider switching to a one cache file per repo strategy.
This new design is not perfect and does have a number of known issues.
First, like the old design it doesn't remove the cache for a source file
that has been (re)moved until `ruff clean` is called.
Second, this currently uses a large mutex around the mutation of the
package cache (e.g. inserting result). This could be (or become) a
bottleneck. It's future work to test and improve this (if needed).
Third, currently the packages and opened and stored in a sequential
loop, this could be done parallel. This is also future work.
## Test Plan
Run `ruff check` (with caching enabled) twice on any Python source code
and it should produce the same results.
This PR introduces a new `CacheKey` trait for types that can be used as a cache key.
I'm not entirely sure if this is worth the "overhead", but I was surprised to find `HashableHashSet` and got scared when I looked at the time complexity of the `hash` function. These implementations must be extremely slow in hashed collections.
I then searched for usages and quickly realized that only the cache uses these `Hash` implementations, where performance is less sensitive.
This PR introduces a new `CacheKey` trait to communicate the difference between a hash and computing a key for the cache. The new trait can be implemented for types that don't implement `Hash` for performance reasons, and we can define additional constraints on the implementation: For example, we'll want to enforce portability when we add remote caching support. Using a different trait further allows us not to implement it for types without stable identities (e.g. pointers) or use other implementations than the standard hash function.