ruff/crates/ruff_annotate_snippets
Brent Westbrook 2b1d3c60fa
Display diffs for ruff format --check and add support for different output formats (#20443)
## Summary

This PR uses the new `Diagnostic` type for rendering formatter
diagnostics. This allows the formatter to inherit all of the output
formats already implemented in the linter and ty. For example, here's
the new `full` output format, with the formatting diff displayed using
the same infrastructure as the linter:

<img width="592" height="364" alt="image"
src="https://github.com/user-attachments/assets/6d09817d-3f27-4960-aa8b-41ba47fb4dc0"
/>


<details><summary>Resolved TODOs</summary>
<p>

~~There are several limitiations/todos here still, especially around the
`OutputFormat` type~~:
- [x] A few literal `todo!`s for the remaining `OutputFormat`s without
matching `DiagnosticFormat`s
- [x] The default output format is `full` instead of something more
concise like the current output
- [x] Some of the output formats (namely JSON) have information that
doesn't make much sense for these diagnostics

The first of these is definitely resolved, and I think the other two are
as well, based on discussion on the design document. In brief, we're
okay inheriting the default `OutputFormat` and can separate the global
option into `lint.output-format` and `format.output-format` in the
future, if needed; and we're okay including redundant information in the
non-human-readable output formats.

My last major concern is with the performance of the new code, as
discussed in the `Benchmarks` section below.

A smaller question is whether we should use `Diagnostic`s for formatting
errors too. I think the answer to this is yes, in line with changes
we're making in the linter too. I still need to implement that here.

</p>
</details> 

<details><summary>Benchmarks</summary>
<p>


The values in the table are from a large benchmark on the CPython 3.10
code
base, which involves checking 2011 files, 1872 of which need to be
reformatted.
`stable` corresponds to the same code used on `main`, while
`preview-full` and
`preview-concise` use the new `Diagnostic` code gated behind `--preview`
for the
`full` and `concise` output formats, respectively. `stable-diff` uses
the
`--diff` to compare the two diff rendering approaches. See the full
hyperfine
command below for more details. For a sense of scale, the `stable`
output format
produces 1873 lines on stdout, compared to 855,278 for `preview-full`
and
857,798 for `stable-diff`.

| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |

|:------------------|--------------:|---------:|---------:|-------------:|
| `stable` | 201.2 ± 6.8 | 192.9 | 220.6 | 1.00 |
| `preview-full` | 9113.2 ± 31.2 | 9076.1 | 9152.0 | 45.29 ± 1.54 |
| `preview-concise` | 214.2 ± 1.4 | 212.0 | 217.6 | 1.06 ± 0.04 |
| `stable-diff` | 3308.6 ± 20.2 | 3278.6 | 3341.8 | 16.44 ± 0.56 |

In summary, the `preview-concise` diagnostics are ~6% slower than the
stable
output format, increasing the average runtime from 201.2 ms to 214.2 ms.
The
`full` preview diagnostics are much more expensive, taking over 9113.2
ms to
complete, which is ~3x more expensive even than the stable diffs
produced by the
`--diff` flag.

My main takeaways here are:
1. Rendering `Edit`s is much more expensive than rendering the diffs
from `--diff`
2. Constructing `Edit`s actually isn't too bad

### Constructing `Edit`s

I also took a closer look at `Edit` construction by modifying the code
and
repeating the `preview-concise` benchmark and found that the main issue
is
constructing a `SourceFile` for use in the `Edit` rendering. Commenting
out the
`Edit` construction itself has basically no effect:

| Command   |   Mean [ms] | Min [ms] | Max [ms] |    Relative |
|:----------|------------:|---------:|---------:|------------:|
| `stable`  | 197.5 ± 1.6 |    195.0 |    200.3 |        1.00 |
| `no-edit` | 208.9 ± 2.2 |    204.8 |    212.2 | 1.06 ± 0.01 |

However, also omitting the source text from the `SourceFile`
construction
resolves the slowdown compared to `stable`. So it seems that copying the
full
source text into a `SourceFile` is the main cause of the slowdown for
non-`full`
diagnostics.

| Command          |   Mean [ms] | Min [ms] | Max [ms] |    Relative |
|:-----------------|------------:|---------:|---------:|------------:|
| `stable`         | 202.4 ± 2.9 |    197.6 |    207.9 |        1.00 |
| `no-source-text` | 202.7 ± 3.3 |    196.3 |    209.1 | 1.00 ± 0.02 |

### Rendering diffs

The main difference between `stable-diff` and `preview-full` seems to be
the diffing strategy we use from `similar`. Both versions use the same
algorithm, but in the existing
[`CodeDiff`](https://github.com/astral-sh/ruff/blob/main/crates/ruff_linter/src/source_kind.rs#L259)
rendering for the `--diff` flag, we only do line-level diffing, whereas
for `Diagnostic`s we use `TextDiff::iter_inline_changes` to highlight
word-level changes too. Skipping the word diff for `Diagnostic`s closes
most of the gap:

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| `stable-diff` | 3.323 ± 0.015 | 3.297 | 3.341 | 1.00 |
| `preview-full` | 3.654 ± 0.019 | 3.618 | 3.682 | 1.10 ± 0.01 |

(In some repeated runs, I've seen as small as a ~5% difference, down
from 10% in the table)

This doesn't actually change any of our snapshots, but it would
obviously change the rendered result in a terminal since we wouldn't
highlight the specific words that changed within a line.

Another much smaller change that we can try is removing the deadline
from the `iter_inline_changes` call. It looks like there's a fair amount
of overhead from the default 500 ms deadline for computing these, and
using `iter_inline_changes(op, None)` (`None` for the optional deadline
argument) improves the runtime quite a bit:

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| `stable-diff` | 3.322 ± 0.013 | 3.298 | 3.341 | 1.00 |
| `preview-full` | 5.296 ± 0.030 | 5.251 | 5.366 | 1.59 ± 0.01 |

<hr>

<details><summary>hyperfine command</summary>

```shell
cargo build --release --bin ruff && hyperfine --ignore-failure --warmup 10 --export-markdown /tmp/table.md \
  -n stable -n preview-full -n preview-concise -n stable-diff \
  "./target/release/ruff format --check ./crates/ruff_linter/resources/test/cpython/ --no-cache" \
  "./target/release/ruff format --check ./crates/ruff_linter/resources/test/cpython/ --no-cache --preview --output-format=full" \
  "./target/release/ruff format --check ./crates/ruff_linter/resources/test/cpython/ --no-cache --preview --output-format=concise" \
  "./target/release/ruff format --check ./crates/ruff_linter/resources/test/cpython/ --no-cache --diff"
```

</details>

</p>
</details> 

## Test Plan

Some new CLI tests and manual testing
2025-09-30 12:00:51 -04:00
..
examples crates: vendor annotate-snippets crate 2025-01-15 13:37:52 -05:00
src Display diffs for ruff format --check and add support for different output formats (#20443) 2025-09-30 12:00:51 -04:00
tests Update transitive dependencies (#20513) 2025-09-22 12:50:53 +02:00
Cargo.toml crates: vendor annotate-snippets crate 2025-01-15 13:37:52 -05:00
LICENSE-APACHE crates: vendor annotate-snippets crate 2025-01-15 13:37:52 -05:00
LICENSE-MIT crates: vendor annotate-snippets crate 2025-01-15 13:37:52 -05:00
README.md crates: vendor annotate-snippets crate 2025-01-15 13:37:52 -05:00

This is a fork of the annotate-snippets crate. The principle motivation for this fork, at the time of writing, is issue #167. Specifically, we wanted to upgrade our version of annotate-snippets, but do so without changing our diagnostic message format.

This copy of annotate-snippets is basically identical to upstream, but with an extra Level::None variant that permits skipping over a new non-optional header emitted by annotate-snippets.

More generally, it seems plausible that we may want to tweak other aspects of the output format in the future, so it might make sense to stick with our own copy so that we can be masters of our own destiny.