## Summary
This PR updates the `FileCache` to include an optional `NotebookIndex`
to support caching for Jupyter Notebooks.
We only require the index to compute the diagnostics and thus we don't
really need to store the entire `Notebook` on the `Diagnostics` struct.
This means we only need the index to be stored in the cache to
reconstruct the `Diagnostics`.
## Test Plan
Update an existing test case to run over the fixtures under
`ruff_notebook` crate where there are multiple Jupyter Notebook.
Locally, the following commands were run in order:
1. Remove the cache: `rm -rf .ruff_cache`
2. Run without cache: `cargo run --bin ruff -- check --isolated
crates/ruff_notebook/resources/test/fixtures/jupyter/unused_variable.ipynb
--no-cache`
3. Run with cache: `cargo run --bin ruff -- check --isolated
crates/ruff_notebook/resources/test/fixtures/jupyter/unused_variable.ipynb`
4. Check whether the `.ruff_cache` directory was created or not
5. Run with cache again and verify: `cargo run --bin ruff -- check
--isolated
crates/ruff_notebook/resources/test/fixtures/jupyter/unused_variable.ipynb`
## Benchmarks
https://github.com/astral-sh/ruff/pull/6863#issuecomment-1715675186fixes: #6671
## Summary
This is equivalent for a single flag, but I think it's more likely to be
correct when the bitflags are modified -- the primary reason being that
we sometimes define flags as the union of other flags, e.g.:
```rust
const ANNOTATION = Self::TYPING_ONLY_ANNOTATION.bits() | Self::RUNTIME_ANNOTATION.bits();
```
In this case, `flags.contains(Flag::ANNOTATION)` requires that _both_
flags in the union are set, whereas `flags.intersects(Flag::ANNOTATION)`
requires that _at least one_ flag is set.
## Summary
This is the result of running `cargo +nightly clippy --workspace
--all-targets --all-features -- -D warnings` and fixing all violations.
Just wanted to see if there were any interesting new checks on nightly
👀
## Summary
A new CLI option (`-o`/`--output-file`) to write output to a file
instead of stdout.
Major change is to remove the lock acquired on stdout. The argument is
that the output is buffered and thus the lock is acquired only when
writing a block (8kb). As per the benchmark below there is a slight
performance penalty.
Reference:
https://rustmagazine.org/issue-3/javascript-compiler/#printing-is-slow
## Benchmarks
_Output is truncated to only contain useful information:_
Command: `check --isolated --no-cache --select=ALL --show-source
./test-repos/cpython"`
Latest HEAD (361d45f2b2) with and without
the manual lock on stdout:
```console
Benchmark 1: With lock
Time (mean ± σ): 5.687 s ± 0.075 s [User: 17.110 s, System: 0.486 s]
Range (min … max): 5.615 s … 5.860 s 10 runs
Benchmark 2: Without lock
Time (mean ± σ): 5.719 s ± 0.064 s [User: 17.095 s, System: 0.491 s]
Range (min … max): 5.640 s … 5.865 s 10 runs
Summary
(1) ran 1.01 ± 0.02 times faster than (2)
```
This PR:
```console
Benchmark 1: This PR
Time (mean ± σ): 5.855 s ± 0.058 s [User: 17.197 s, System: 0.491 s]
Range (min … max): 5.786 s … 5.987 s 10 runs
Benchmark 2: Latest HEAD with lock
Time (mean ± σ): 5.645 s ± 0.033 s [User: 16.922 s, System: 0.495 s]
Range (min … max): 5.600 s … 5.712 s 10 runs
Summary
(2) ran 1.04 ± 0.01 times faster than (1)
```
## Test Plan
Run all of the commands which gives output with and without the
`--output-file=ruff.out` option:
* `--show-settings`
* `--show-files`
* `--show-fixes`
* `--diff`
* `--select=ALL`
* `--select=All --show-source`
* `--watch` (only stdout allowed)
resolves: #4754
## Summary
We now _always_ generate fixes, so `FixMode::None` and
`FixMode::Generate` are redundant. We can also remove the TODO around
`--fix-dry-run`, since that's our default behavior.
Closes#5081.
## Summary
This adds `json-lines` (https://jsonlines.org/ or http://ndjson.org/) as
an output format.
I'm sure you already know, but
* JSONL is more greppable (each record is a single line) than the pretty
JSON
* JSONL is faster to ingest piecewise (and/or in parallel) than JSON
## Test Plan
Snapshot test in the new module :)
## Summary
Add support for applying auto-fixes in Jupyter Notebook.
### Solution
Cell offsets are the boundaries for each cell in the concatenated source
code. They are represented using `TextSize`. It includes the start and
end offset as well, thus creating a range for each cell. These offsets
are updated using the `SourceMap` markers.
### SourceMap
`SourceMap` contains markers constructed from each edits which tracks
the original source code position to the transformed positions. The
following drawing might make it clear:

The center column where the dotted lines are present are the markers
included in the `SourceMap`. The `Notebook` looks at these markers and
updates the cell offsets after each linter loop. If you notice closely,
the destination takes into account all of the markers before it.
The index is constructed only when required as it's only used to render
the diagnostics. So, a `OnceCell` is used for this purpose. The cell
offsets, cell content and the index will be updated after each iteration
of linting in the mentioned order. The order is important here as the
content is updated as per the new offsets and index is updated as per
the new content.
## Limitations
### 1
Styling rules such as the ones in `pycodestyle` will not be applicable
everywhere in Jupyter notebook, especially at the cell boundaries. Let's
take an example where a rule suggests to have 2 blank lines before a
function and the cells contains the following code:
```python
import something
# ---
def first():
pass
def second():
pass
```
(Again, the comment is only to visualize cell boundaries.)
In the concatenated source code, the 2 blank lines will be added but it
shouldn't actually be added when we look in terms of Jupyter notebook.
It's as if the function `first` is at the start of a file.
`nbqa` solves this by recording newlines before and after running
`autopep8`, then running the tool and restoring the newlines at the end
(refer https://github.com/nbQA-dev/nbQA/pull/807).
## Test Plan
Three commands were run in order with common flags (`--select=ALL
--no-cache --isolated`) to isolate which stage the problem is occurring:
1. Only diagnostics
2. Fix with diff (`--fix --diff`)
3. Fix (`--fix`)
### https://github.com/facebookresearch/segment-anything
```
-------------------------------------------------------------------------------
Jupyter Notebooks 3 0 0 0 0
|- Markdown 3 98 0 94 4
|- Python 3 513 468 4 41
(Total) 611 468 98 45
-------------------------------------------------------------------------------
```
```console
$ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/segment-anything/**/*.ipynb --fix
...
Found 180 errors (89 fixed, 91 remaining).
```
### https://github.com/openai/openai-cookbook
```
-------------------------------------------------------------------------------
Jupyter Notebooks 65 0 0 0 0
|- Markdown 64 3475 12 2507 956
|- Python 65 9700 7362 1101 1237
(Total) 13175 7374 3608 2193
===============================================================================
```
```console
$ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/openai-cookbook/**/*.ipynb --fix
error: Failed to parse /path/to/openai-cookbook/examples/vector_databases/Using_vector_databases_for_embeddings_search.ipynb:cell 4:29:18: unexpected token '-'
...
Found 4227 errors (2165 fixed, 2062 remaining).
```
### https://github.com/tensorflow/docs
```
-------------------------------------------------------------------------------
Jupyter Notebooks 150 0 0 0 0
|- Markdown 1 55 0 46 9
|- Python 1 402 289 60 53
(Total) 457 289 106 62
-------------------------------------------------------------------------------
```
```console
$ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/tensorflow-docs/**/*.ipynb --fix
error: Failed to parse /path/to/tensorflow-docs/site/en/guide/extension_type.ipynb:cell 80:1:1: unexpected token Indent
error: Failed to parse /path/to/tensorflow-docs/site/en/r1/tutorials/eager/custom_layers.ipynb:cell 20:1:1: unexpected token Indent
error: Failed to parse /path/to/tensorflow-docs/site/en/guide/data.ipynb:cell 175:5:14: unindent does not match any outer indentation level
error: Failed to parse /path/to/tensorflow-docs/site/en/r1/tutorials/representation/unicode.ipynb:cell 30:1:1: unexpected token Indent
...
Found 12726 errors (5140 fixed, 7586 remaining).
```
### https://github.com/tensorflow/models
```
-------------------------------------------------------------------------------
Jupyter Notebooks 46 0 0 0 0
|- Markdown 1 11 0 6 5
|- Python 1 328 249 19 60
(Total) 339 249 25 65
-------------------------------------------------------------------------------
```
```console
$ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/tensorflow-models/**/*.ipynb --fix
...
Found 4856 errors (2690 fixed, 2166 remaining).
```
resolves: #1218fixes: #4556
* Add basic jupyter notebook support behind a feature flag
* Address review comments
* Rename in separate commit to make both git and clippy happy
* cfg(feature = "jupyter_notebook") another test
* Address more review comments
* Address more review comments
* and clippy and windows
* More review comment
In ruff-lsp (https://github.com/charliermarsh/ruff-lsp/pull/76) we want to add a "Disable \<rule\> for this line" quickfix. However, finding the correct line into which the `noqa` comment should be inserted is non-trivial (multi-line strings for example).
Ruff already has this info, so expose it in the JSON output for use by ruff-lsp.
Rule::noqa_code previously return a single &'static str,
which was possible because we had one enum listing all
rule code prefixes. This commit series will however split up
the RuleCodePrefix enum into several enums ... so we'll end up
with two &'static str ... this commit wraps the return type
of Rule::noqa_code into a newtype so that we can easily change
it to return two &'static str in the 6th commit of this series.
Post this commit series several codes can be mapped to a single rule,
this commit therefore renames Rule::code to Rule::noqa_code,
which is the code that --add-noqa will add to ignore a rule.