ruff/crates/ruff_source_file/src/lib.rs
Dhruv Manilawala 66179af4f1
Add cell field to JSON output format (#7664)
## Summary

This PR adds a new `cell` field to the JSON output format which
indicates the Notebook cell this diagnostic (and fix) belongs to. It
also updates the location for the diagnostic and fixes as per the
`NotebookIndex`. It will be used in the VSCode extension to display the
diagnostic in the correct cell.

The diagnostic and edit start and end source locations are translated
for the notebook as per the `NotebookIndex`. The end source location for
an edit needs some special handling.

### Edit end location

To understand this, the following context is required:

1. Visible lines in Jupyter Notebook vs JSON array strings: The newline
is part of the string in the JSON format. This means that if there are 3
visible lines in a cell where the last line is empty then the JSON would
contain 2 strings in the source array, both ending with a newline:

**JSON format:**
```json
[
	"# first line\n",
	"# second line\n",
]
```

**Notebook view:**
```python
1 # first line
2 # second line
3
```

2. If an edit needs to remove an entire line including the newline, then
the end location would be the start of the next row.

To remove a statement in the following code:
```python
import os
```

The edit would be:
```
start: row 1, col 1
end: row 2, col 1
```

Now, here's where the problem lies. The notebook index doesn't have any
information for row 2 because it doesn't exists in the actual notebook.
The newline was added by Ruff to concatenate the source code and it's
removed before writing back. But, the edit is computed looking at that
newline.

This means that while translating the end location for an edit belong to
a Notebook, we need to check if both the start and end location belongs
to the same cell. If not, then the end location should be the first
character of the next row and if so, translate that back to the last
character of the previous row. Taking the above example, the translated
location for Notebook would be:
```
start: row 1, col 1
end: row 1, col 10
```

## Test Plan

Add test cases for notebook output in the JSON format and update
existing snapshots.
2023-10-13 01:06:02 +00:00

254 lines
6.4 KiB
Rust

use std::cmp::Ordering;
use std::fmt::{Debug, Formatter};
use std::sync::Arc;
use ruff_text_size::{Ranged, TextRange, TextSize};
#[cfg(feature = "serde")]
use serde::{Deserialize, Serialize};
pub mod line_index;
mod locator;
pub mod newlines;
pub use crate::line_index::{LineIndex, OneIndexed};
pub use locator::Locator;
pub use newlines::{
find_newline, Line, LineEnding, NewlineWithTrailingNewline, UniversalNewlineIterator,
UniversalNewlines,
};
/// Gives access to the source code of a file and allows mapping between [`TextSize`] and [`SourceLocation`].
#[derive(Debug)]
pub struct SourceCode<'src, 'index> {
text: &'src str,
index: &'index LineIndex,
}
impl<'src, 'index> SourceCode<'src, 'index> {
pub fn new(content: &'src str, index: &'index LineIndex) -> Self {
Self {
text: content,
index,
}
}
/// Computes the one indexed row and column numbers for `offset`.
#[inline]
pub fn source_location(&self, offset: TextSize) -> SourceLocation {
self.index.source_location(offset, self.text)
}
#[inline]
pub fn line_index(&self, offset: TextSize) -> OneIndexed {
self.index.line_index(offset)
}
/// Take the source code up to the given [`TextSize`].
#[inline]
pub fn up_to(&self, offset: TextSize) -> &'src str {
&self.text[TextRange::up_to(offset)]
}
/// Take the source code after the given [`TextSize`].
#[inline]
pub fn after(&self, offset: TextSize) -> &'src str {
&self.text[usize::from(offset)..]
}
/// Take the source code between the given [`TextRange`].
pub fn slice<T: Ranged>(&self, ranged: T) -> &'src str {
&self.text[ranged.range()]
}
pub fn line_start(&self, line: OneIndexed) -> TextSize {
self.index.line_start(line, self.text)
}
pub fn line_end(&self, line: OneIndexed) -> TextSize {
self.index.line_end(line, self.text)
}
pub fn line_end_exclusive(&self, line: OneIndexed) -> TextSize {
self.index.line_end_exclusive(line, self.text)
}
pub fn line_range(&self, line: OneIndexed) -> TextRange {
self.index.line_range(line, self.text)
}
/// Returns the source text of the line with the given index
#[inline]
pub fn line_text(&self, index: OneIndexed) -> &'src str {
let range = self.index.line_range(index, self.text);
&self.text[range]
}
/// Returns the source text
pub fn text(&self) -> &'src str {
self.text
}
/// Returns the number of lines
#[inline]
pub fn line_count(&self) -> usize {
self.index.line_count()
}
}
impl PartialEq<Self> for SourceCode<'_, '_> {
fn eq(&self, other: &Self) -> bool {
self.text == other.text
}
}
impl Eq for SourceCode<'_, '_> {}
/// A Builder for constructing a [`SourceFile`]
pub struct SourceFileBuilder {
name: Box<str>,
code: Box<str>,
index: Option<LineIndex>,
}
impl SourceFileBuilder {
/// Creates a new builder for a file named `name`.
pub fn new<Name: Into<Box<str>>, Code: Into<Box<str>>>(name: Name, code: Code) -> Self {
Self {
name: name.into(),
code: code.into(),
index: None,
}
}
#[must_use]
pub fn line_index(mut self, index: LineIndex) -> Self {
self.index = Some(index);
self
}
pub fn set_line_index(&mut self, index: LineIndex) {
self.index = Some(index);
}
/// Consumes `self` and returns the [`SourceFile`].
pub fn finish(self) -> SourceFile {
let index = if let Some(index) = self.index {
once_cell::sync::OnceCell::with_value(index)
} else {
once_cell::sync::OnceCell::new()
};
SourceFile {
inner: Arc::new(SourceFileInner {
name: self.name,
code: self.code,
line_index: index,
}),
}
}
}
/// A source file that is identified by its name. Optionally stores the source code and [`LineIndex`].
///
/// Cloning a [`SourceFile`] is cheap, because it only requires bumping a reference count.
#[derive(Clone, Eq, PartialEq)]
pub struct SourceFile {
inner: Arc<SourceFileInner>,
}
impl Debug for SourceFile {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
f.debug_struct("SourceFile")
.field("name", &self.name())
.field("code", &self.source_text())
.finish()
}
}
impl SourceFile {
/// Returns the name of the source file (filename).
#[inline]
pub fn name(&self) -> &str {
&self.inner.name
}
#[inline]
pub fn slice(&self, range: TextRange) -> &str {
&self.source_text()[range]
}
pub fn to_source_code(&self) -> SourceCode {
SourceCode {
text: self.source_text(),
index: self.index(),
}
}
fn index(&self) -> &LineIndex {
self.inner
.line_index
.get_or_init(|| LineIndex::from_source_text(self.source_text()))
}
/// Returns the source code.
#[inline]
pub fn source_text(&self) -> &str {
&self.inner.code
}
}
impl PartialOrd for SourceFile {
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
Some(self.cmp(other))
}
}
impl Ord for SourceFile {
fn cmp(&self, other: &Self) -> Ordering {
// Short circuit if these are the same source files
if Arc::ptr_eq(&self.inner, &other.inner) {
Ordering::Equal
} else {
self.inner.name.cmp(&other.inner.name)
}
}
}
struct SourceFileInner {
name: Box<str>,
code: Box<str>,
line_index: once_cell::sync::OnceCell<LineIndex>,
}
impl PartialEq for SourceFileInner {
fn eq(&self, other: &Self) -> bool {
self.name == other.name && self.code == other.code
}
}
impl Eq for SourceFileInner {}
#[derive(Clone, Eq, PartialEq, Ord, PartialOrd, Hash)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
pub struct SourceLocation {
pub row: OneIndexed,
pub column: OneIndexed,
}
impl Default for SourceLocation {
fn default() -> Self {
Self {
row: OneIndexed::MIN,
column: OneIndexed::MIN,
}
}
}
impl Debug for SourceLocation {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
f.debug_struct("SourceLocation")
.field("row", &self.row.get())
.field("column", &self.column.get())
.finish()
}
}