ruff/crates/ruff_annotate_snippets/src/snippet.rs
Brent Westbrook 2b1d3c60fa
Display diffs for ruff format --check and add support for different output formats (#20443)
## Summary

This PR uses the new `Diagnostic` type for rendering formatter
diagnostics. This allows the formatter to inherit all of the output
formats already implemented in the linter and ty. For example, here's
the new `full` output format, with the formatting diff displayed using
the same infrastructure as the linter:

<img width="592" height="364" alt="image"
src="https://github.com/user-attachments/assets/6d09817d-3f27-4960-aa8b-41ba47fb4dc0"
/>


<details><summary>Resolved TODOs</summary>
<p>

~~There are several limitiations/todos here still, especially around the
`OutputFormat` type~~:
- [x] A few literal `todo!`s for the remaining `OutputFormat`s without
matching `DiagnosticFormat`s
- [x] The default output format is `full` instead of something more
concise like the current output
- [x] Some of the output formats (namely JSON) have information that
doesn't make much sense for these diagnostics

The first of these is definitely resolved, and I think the other two are
as well, based on discussion on the design document. In brief, we're
okay inheriting the default `OutputFormat` and can separate the global
option into `lint.output-format` and `format.output-format` in the
future, if needed; and we're okay including redundant information in the
non-human-readable output formats.

My last major concern is with the performance of the new code, as
discussed in the `Benchmarks` section below.

A smaller question is whether we should use `Diagnostic`s for formatting
errors too. I think the answer to this is yes, in line with changes
we're making in the linter too. I still need to implement that here.

</p>
</details> 

<details><summary>Benchmarks</summary>
<p>


The values in the table are from a large benchmark on the CPython 3.10
code
base, which involves checking 2011 files, 1872 of which need to be
reformatted.
`stable` corresponds to the same code used on `main`, while
`preview-full` and
`preview-concise` use the new `Diagnostic` code gated behind `--preview`
for the
`full` and `concise` output formats, respectively. `stable-diff` uses
the
`--diff` to compare the two diff rendering approaches. See the full
hyperfine
command below for more details. For a sense of scale, the `stable`
output format
produces 1873 lines on stdout, compared to 855,278 for `preview-full`
and
857,798 for `stable-diff`.

| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |

|:------------------|--------------:|---------:|---------:|-------------:|
| `stable` | 201.2 ± 6.8 | 192.9 | 220.6 | 1.00 |
| `preview-full` | 9113.2 ± 31.2 | 9076.1 | 9152.0 | 45.29 ± 1.54 |
| `preview-concise` | 214.2 ± 1.4 | 212.0 | 217.6 | 1.06 ± 0.04 |
| `stable-diff` | 3308.6 ± 20.2 | 3278.6 | 3341.8 | 16.44 ± 0.56 |

In summary, the `preview-concise` diagnostics are ~6% slower than the
stable
output format, increasing the average runtime from 201.2 ms to 214.2 ms.
The
`full` preview diagnostics are much more expensive, taking over 9113.2
ms to
complete, which is ~3x more expensive even than the stable diffs
produced by the
`--diff` flag.

My main takeaways here are:
1. Rendering `Edit`s is much more expensive than rendering the diffs
from `--diff`
2. Constructing `Edit`s actually isn't too bad

### Constructing `Edit`s

I also took a closer look at `Edit` construction by modifying the code
and
repeating the `preview-concise` benchmark and found that the main issue
is
constructing a `SourceFile` for use in the `Edit` rendering. Commenting
out the
`Edit` construction itself has basically no effect:

| Command   |   Mean [ms] | Min [ms] | Max [ms] |    Relative |
|:----------|------------:|---------:|---------:|------------:|
| `stable`  | 197.5 ± 1.6 |    195.0 |    200.3 |        1.00 |
| `no-edit` | 208.9 ± 2.2 |    204.8 |    212.2 | 1.06 ± 0.01 |

However, also omitting the source text from the `SourceFile`
construction
resolves the slowdown compared to `stable`. So it seems that copying the
full
source text into a `SourceFile` is the main cause of the slowdown for
non-`full`
diagnostics.

| Command          |   Mean [ms] | Min [ms] | Max [ms] |    Relative |
|:-----------------|------------:|---------:|---------:|------------:|
| `stable`         | 202.4 ± 2.9 |    197.6 |    207.9 |        1.00 |
| `no-source-text` | 202.7 ± 3.3 |    196.3 |    209.1 | 1.00 ± 0.02 |

### Rendering diffs

The main difference between `stable-diff` and `preview-full` seems to be
the diffing strategy we use from `similar`. Both versions use the same
algorithm, but in the existing
[`CodeDiff`](https://github.com/astral-sh/ruff/blob/main/crates/ruff_linter/src/source_kind.rs#L259)
rendering for the `--diff` flag, we only do line-level diffing, whereas
for `Diagnostic`s we use `TextDiff::iter_inline_changes` to highlight
word-level changes too. Skipping the word diff for `Diagnostic`s closes
most of the gap:

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| `stable-diff` | 3.323 ± 0.015 | 3.297 | 3.341 | 1.00 |
| `preview-full` | 3.654 ± 0.019 | 3.618 | 3.682 | 1.10 ± 0.01 |

(In some repeated runs, I've seen as small as a ~5% difference, down
from 10% in the table)

This doesn't actually change any of our snapshots, but it would
obviously change the rendered result in a terminal since we wouldn't
highlight the specific words that changed within a line.

Another much smaller change that we can try is removing the deadline
from the `iter_inline_changes` call. It looks like there's a fair amount
of overhead from the default 500 ms deadline for computing these, and
using `iter_inline_changes(op, None)` (`None` for the optional deadline
argument) improves the runtime quite a bit:

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| `stable-diff` | 3.322 ± 0.013 | 3.298 | 3.341 | 1.00 |
| `preview-full` | 5.296 ± 0.030 | 5.251 | 5.366 | 1.59 ± 0.01 |

<hr>

<details><summary>hyperfine command</summary>

```shell
cargo build --release --bin ruff && hyperfine --ignore-failure --warmup 10 --export-markdown /tmp/table.md \
  -n stable -n preview-full -n preview-concise -n stable-diff \
  "./target/release/ruff format --check ./crates/ruff_linter/resources/test/cpython/ --no-cache" \
  "./target/release/ruff format --check ./crates/ruff_linter/resources/test/cpython/ --no-cache --preview --output-format=full" \
  "./target/release/ruff format --check ./crates/ruff_linter/resources/test/cpython/ --no-cache --preview --output-format=concise" \
  "./target/release/ruff format --check ./crates/ruff_linter/resources/test/cpython/ --no-cache --diff"
```

</details>

</p>
</details> 

## Test Plan

Some new CLI tests and manual testing
2025-09-30 12:00:51 -04:00

200 lines
5.3 KiB
Rust

//! Structures used as an input for the library.
//!
//! Example:
//!
//! ```
//! use ruff_annotate_snippets::*;
//!
//! Level::Error.title("mismatched types")
//! .snippet(Snippet::source("Foo").line_start(51).origin("src/format.rs"))
//! .snippet(Snippet::source("Faa").line_start(129).origin("src/display.rs"));
//! ```
use std::ops::Range;
/// Primary structure provided for formatting
///
/// See [`Level::title`] to create a [`Message`]
#[derive(Debug)]
pub struct Message<'a> {
pub(crate) level: Level,
pub(crate) id: Option<&'a str>,
pub(crate) title: &'a str,
pub(crate) snippets: Vec<Snippet<'a>>,
pub(crate) footer: Vec<Message<'a>>,
pub(crate) is_fixable: bool,
pub(crate) lineno_offset: usize,
}
impl<'a> Message<'a> {
pub fn id(mut self, id: &'a str) -> Self {
self.id = Some(id);
self
}
pub fn snippet(mut self, slice: Snippet<'a>) -> Self {
self.snippets.push(slice);
self
}
pub fn snippets(mut self, slice: impl IntoIterator<Item = Snippet<'a>>) -> Self {
self.snippets.extend(slice);
self
}
pub fn footer(mut self, footer: Message<'a>) -> Self {
self.footer.push(footer);
self
}
pub fn footers(mut self, footer: impl IntoIterator<Item = Message<'a>>) -> Self {
self.footer.extend(footer);
self
}
/// Whether or not the diagnostic for this message is fixable.
///
/// This is rendered as a `[*]` indicator after the `id` in an annotation header, if the
/// annotation also has `Level::None`.
pub fn is_fixable(mut self, yes: bool) -> Self {
self.is_fixable = yes;
self
}
/// Add an offset used for aligning the header sigil (`-->`) with the line number separators.
///
/// For normal diagnostics this is computed automatically based on the lines to be rendered.
/// This is intended only for use in the formatter, where we don't render a snippet directly but
/// still want the header to align with the diff.
pub fn lineno_offset(mut self, offset: usize) -> Self {
self.lineno_offset = offset;
self
}
}
/// Structure containing the slice of text to be annotated and
/// basic information about the location of the slice.
///
/// One `Snippet` is meant to represent a single, continuous,
/// slice of source code that you want to annotate.
#[derive(Debug)]
pub struct Snippet<'a> {
pub(crate) origin: Option<&'a str>,
pub(crate) line_start: usize,
pub(crate) source: &'a str,
pub(crate) annotations: Vec<Annotation<'a>>,
pub(crate) fold: bool,
/// The optional cell index in a Jupyter notebook, used for reporting source locations along
/// with the ranges on `annotations`.
pub(crate) cell_index: Option<usize>,
}
impl<'a> Snippet<'a> {
pub fn source(source: &'a str) -> Self {
Self {
origin: None,
line_start: 1,
source,
annotations: vec![],
fold: false,
cell_index: None,
}
}
pub fn line_start(mut self, line_start: usize) -> Self {
self.line_start = line_start;
self
}
pub fn origin(mut self, origin: &'a str) -> Self {
self.origin = Some(origin);
self
}
pub fn annotation(mut self, annotation: Annotation<'a>) -> Self {
self.annotations.push(annotation);
self
}
pub fn annotations(mut self, annotation: impl IntoIterator<Item = Annotation<'a>>) -> Self {
self.annotations.extend(annotation);
self
}
/// Hide lines without [`Annotation`]s
pub fn fold(mut self, fold: bool) -> Self {
self.fold = fold;
self
}
/// Attach a Jupyter notebook cell index.
pub fn cell_index(mut self, index: Option<usize>) -> Self {
self.cell_index = index;
self
}
}
/// An annotation for a [`Snippet`].
///
/// See [`Level::span`] to create a [`Annotation`]
#[derive(Debug)]
pub struct Annotation<'a> {
/// The byte range of the annotation in the `source` string
pub(crate) range: Range<usize>,
pub(crate) label: Option<&'a str>,
pub(crate) level: Level,
pub(crate) is_file_level: bool,
}
impl<'a> Annotation<'a> {
pub fn label(mut self, label: &'a str) -> Self {
self.label = Some(label);
self
}
pub fn hide_snippet(mut self, yes: bool) -> Self {
self.is_file_level = yes;
self
}
}
/// Types of annotations.
#[derive(Debug, Clone, Copy, PartialEq)]
pub enum Level {
/// Do not attach any annotation.
None,
/// Error annotations are displayed using red color and "^" character.
Error,
/// Warning annotations are displayed using blue color and "-" character.
Warning,
Info,
Note,
Help,
}
impl Level {
pub fn title(self, title: &str) -> Message<'_> {
Message {
level: self,
id: None,
title,
snippets: vec![],
footer: vec![],
is_fixable: false,
lineno_offset: 0,
}
}
/// Create a [`Annotation`] with the given span for a [`Snippet`]
pub fn span<'a>(self, span: Range<usize>) -> Annotation<'a> {
Annotation {
range: span,
label: None,
level: self,
is_file_level: false,
}
}
}