ruff server: Support Jupyter Notebook (*.ipynb) files (#11206)

## Summary

Closes https://github.com/astral-sh/ruff/issues/10858.

`ruff server` now supports `*.ipynb` (aka Jupyter Notebook) files.
Extensive internal changes have been made to facilitate this, which I've
done some work to contextualize with documentation and an pre-review
that highlights notable sections of the code.

`*.ipynb` cells should behave similarly to `*.py` documents, with one
major exception. The format command `ruff.applyFormat` will only apply
to the currently selected notebook cell - if you want to format an
entire notebook document, use `Format Notebook` from the VS Code context
menu.

## Test Plan

The VS Code extension does not yet have Jupyter Notebook support
enabled, so you'll first need to enable it manually. To do this,
checkout the `pre-release` branch and modify `src/common/server.ts` as
follows:

Before:
![Screenshot 2024-05-13 at 10 59
06 PM](c6a3c604-c405-4968-b8a2-5d670de89172)

After:
![Screenshot 2024-05-13 at 10 58
24 PM](94ab2e3d-0609-448d-9c8c-cd07c69a513b)

I recommend testing this PR with large, complicated notebook files. I
used notebook files from [this popular
repository](https://github.com/jakevdp/PythonDataScienceHandbook/tree/master/notebooks)
in my preliminary testing.

The main thing to test is ensuring that notebook cells behave the same
as Python documents, besides the aforementioned issue with
`ruff.applyFormat`. You should also test adding and deleting cells (in
particular, deleting all the code cells and ensure that doesn't break
anything), changing the kind of a cell (i.e. from markup -> code or vice
versa), and creating a new notebook file from scratch. Finally, you
should also test that source actions work as expected (and across the
entire notebook).

Note: `ruff.applyAutofix` and `ruff.applyOrganizeImports` are currently
broken for notebook files, and I suspect it has something to do with
https://github.com/astral-sh/ruff/issues/11248. Once this is fixed, I
will update the test plan accordingly.

---------

Co-authored-by: nolan <nolan.king90@gmail.com>
This commit is contained in:
Jane Lewis 2024-05-21 15:29:30 -07:00 committed by GitHub
parent 84531d1644
commit b0731ef9cb
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
39 changed files with 1584 additions and 622 deletions

View file

@ -7,10 +7,10 @@ use super::RangeExt;
pub(crate) type DocumentVersion = i32;
/// The state for an individual document in the server. Stays up-to-date
/// The state of an individual document in the server. Stays up-to-date
/// with changes made by the user, including unsaved changes.
#[derive(Debug, Clone)]
pub struct Document {
pub struct TextDocument {
/// The string contents of the document.
contents: String,
/// A computed line index for the document. This should always reflect
@ -22,7 +22,7 @@ pub struct Document {
version: DocumentVersion,
}
impl Document {
impl TextDocument {
pub fn new(contents: String, version: DocumentVersion) -> Self {
let index = LineIndex::from_source_text(&contents);
Self {

View file

@ -0,0 +1,202 @@
use std::{collections::HashMap, hash::BuildHasherDefault};
use anyhow::Ok;
use lsp_types::{NotebookCellKind, Url};
use rustc_hash::FxHashMap;
use crate::{PositionEncoding, TextDocument};
use super::DocumentVersion;
pub(super) type CellId = usize;
/// The state of a notebook document in the server. Contains an array of cells whose
/// contents are internally represented by [`TextDocument`]s.
#[derive(Clone, Debug)]
pub(crate) struct NotebookDocument {
cells: Vec<NotebookCell>,
metadata: ruff_notebook::RawNotebookMetadata,
version: DocumentVersion,
// Used to quickly find the index of a cell for a given URL.
cell_index: FxHashMap<lsp_types::Url, CellId>,
}
/// A single cell within a notebook, which has text contents represented as a `TextDocument`.
#[derive(Clone, Debug)]
struct NotebookCell {
url: Url,
kind: NotebookCellKind,
document: TextDocument,
}
impl NotebookDocument {
pub(crate) fn new(
version: DocumentVersion,
cells: Vec<lsp_types::NotebookCell>,
metadata: serde_json::Map<String, serde_json::Value>,
cell_documents: Vec<lsp_types::TextDocumentItem>,
) -> crate::Result<Self> {
let mut cell_contents: FxHashMap<_, _> = cell_documents
.into_iter()
.map(|document| (document.uri, document.text))
.collect();
let cells: Vec<_> = cells
.into_iter()
.map(|cell| {
let contents = cell_contents.remove(&cell.document).unwrap_or_default();
NotebookCell::new(cell, contents, version)
})
.collect();
Ok(Self {
version,
cell_index: Self::make_cell_index(cells.as_slice()),
metadata: serde_json::from_value(serde_json::Value::Object(metadata))?,
cells,
})
}
/// Generates a pseudo-representation of a notebook that lacks per-cell metadata and contextual information
/// but should still work with Ruff's linter.
pub(crate) fn make_ruff_notebook(&self) -> ruff_notebook::Notebook {
let cells = self
.cells
.iter()
.map(|cell| match cell.kind {
NotebookCellKind::Code => ruff_notebook::Cell::Code(ruff_notebook::CodeCell {
execution_count: None,
id: None,
metadata: serde_json::Value::Null,
outputs: vec![],
source: ruff_notebook::SourceValue::String(
cell.document.contents().to_string(),
),
}),
NotebookCellKind::Markup => {
ruff_notebook::Cell::Markdown(ruff_notebook::MarkdownCell {
attachments: None,
id: None,
metadata: serde_json::Value::Null,
source: ruff_notebook::SourceValue::String(
cell.document.contents().to_string(),
),
})
}
})
.collect();
let raw_notebook = ruff_notebook::RawNotebook {
cells,
metadata: self.metadata.clone(),
nbformat: 4,
nbformat_minor: 5,
};
ruff_notebook::Notebook::from_raw_notebook(raw_notebook, false)
.unwrap_or_else(|err| panic!("Server notebook document could not be converted to Ruff's notebook document format: {err}"))
}
pub(crate) fn update(
&mut self,
cells: Option<lsp_types::NotebookDocumentCellChange>,
metadata_change: Option<serde_json::Map<String, serde_json::Value>>,
version: DocumentVersion,
encoding: PositionEncoding,
) -> crate::Result<()> {
self.version = version;
if let Some(lsp_types::NotebookDocumentCellChange {
structure,
data,
text_content,
}) = cells
{
if let Some(structure) = structure {
let start = structure.array.start as usize;
let delete = structure.array.delete_count as usize;
if delete > 0 {
for cell in self.cells.drain(start..start + delete) {
self.cell_index.remove(&cell.url);
}
}
for cell in structure.array.cells.into_iter().flatten().rev() {
self.cells
.insert(start, NotebookCell::new(cell, String::new(), version));
}
// register any new cells in the index and update existing ones that came after the insertion
for (i, cell) in self.cells.iter().enumerate().skip(start) {
self.cell_index.insert(cell.url.clone(), i);
}
}
if let Some(cell_data) = data {
for cell in cell_data {
if let Some(existing_cell) = self.cell_by_uri_mut(&cell.document) {
existing_cell.kind = cell.kind;
}
}
}
if let Some(content_changes) = text_content {
for content_change in content_changes {
if let Some(cell) = self.cell_by_uri_mut(&content_change.document.uri) {
cell.document
.apply_changes(content_change.changes, version, encoding);
}
}
}
}
if let Some(metadata_change) = metadata_change {
self.metadata = serde_json::from_value(serde_json::Value::Object(metadata_change))?;
}
Ok(())
}
/// Get the current version of the notebook document.
pub(crate) fn version(&self) -> DocumentVersion {
self.version
}
/// Get the URI for a cell by its index within the cell array.
pub(crate) fn cell_uri_by_index(&self, index: CellId) -> Option<&lsp_types::Url> {
self.cells.get(index).map(|cell| &cell.url)
}
/// Get the text document representing the contents of a cell by the cell URI.
pub(crate) fn cell_document_by_uri(&self, uri: &lsp_types::Url) -> Option<&TextDocument> {
self.cells
.get(*self.cell_index.get(uri)?)
.map(|cell| &cell.document)
}
/// Returns a list of cell URIs in the order they appear in the array.
pub(crate) fn urls(&self) -> impl Iterator<Item = &lsp_types::Url> {
self.cells.iter().map(|cell| &cell.url)
}
fn cell_by_uri_mut(&mut self, uri: &lsp_types::Url) -> Option<&mut NotebookCell> {
self.cells.get_mut(*self.cell_index.get(uri)?)
}
fn make_cell_index(cells: &[NotebookCell]) -> FxHashMap<lsp_types::Url, CellId> {
let mut index =
HashMap::with_capacity_and_hasher(cells.len(), BuildHasherDefault::default());
for (i, cell) in cells.iter().enumerate() {
index.insert(cell.url.clone(), i);
}
index
}
}
impl NotebookCell {
pub(crate) fn new(
cell: lsp_types::NotebookCell,
contents: String,
version: DocumentVersion,
) -> Self {
Self {
url: cell.document,
kind: cell.kind,
document: TextDocument::new(contents, version),
}
}
}

View file

@ -1,9 +1,16 @@
use super::notebook;
use super::PositionEncoding;
use lsp_types as types;
use ruff_notebook::NotebookIndex;
use ruff_source_file::OneIndexed;
use ruff_source_file::{LineIndex, SourceLocation};
use ruff_text_size::{TextRange, TextSize};
pub(crate) struct NotebookRange {
pub(crate) cell: notebook::CellId,
pub(crate) range: types::Range,
}
pub(crate) trait RangeExt {
fn to_text_range(&self, text: &str, index: &LineIndex, encoding: PositionEncoding)
-> TextRange;
@ -11,6 +18,13 @@ pub(crate) trait RangeExt {
pub(crate) trait ToRangeExt {
fn to_range(&self, text: &str, index: &LineIndex, encoding: PositionEncoding) -> types::Range;
fn to_notebook_range(
&self,
text: &str,
source_index: &LineIndex,
notebook_index: &NotebookIndex,
encoding: PositionEncoding,
) -> NotebookRange;
}
fn u32_index_to_usize(index: u32) -> usize {
@ -83,8 +97,54 @@ impl RangeExt for lsp_types::Range {
impl ToRangeExt for TextRange {
fn to_range(&self, text: &str, index: &LineIndex, encoding: PositionEncoding) -> types::Range {
types::Range {
start: offset_to_position(self.start(), text, index, encoding),
end: offset_to_position(self.end(), text, index, encoding),
start: source_location_to_position(&offset_to_source_location(
self.start(),
text,
index,
encoding,
)),
end: source_location_to_position(&offset_to_source_location(
self.end(),
text,
index,
encoding,
)),
}
}
fn to_notebook_range(
&self,
text: &str,
source_index: &LineIndex,
notebook_index: &NotebookIndex,
encoding: PositionEncoding,
) -> NotebookRange {
let start = offset_to_source_location(self.start(), text, source_index, encoding);
let mut end = offset_to_source_location(self.end(), text, source_index, encoding);
let starting_cell = notebook_index.cell(start.row);
// weird edge case here - if the end of the range is where the newline after the cell got added (making it 'out of bounds')
// we need to move it one character back (which should place it at the end of the last line).
// we test this by checking if the ending offset is in a different (or nonexistent) cell compared to the cell of the starting offset.
if notebook_index.cell(end.row) != starting_cell {
end.row = end.row.saturating_sub(1);
end.column = offset_to_source_location(
self.end().checked_sub(1.into()).unwrap_or_default(),
text,
source_index,
encoding,
)
.column;
}
let start = source_location_to_position(&notebook_index.translate_location(&start));
let end = source_location_to_position(&notebook_index.translate_location(&end));
NotebookRange {
cell: starting_cell
.map(OneIndexed::to_zero_indexed)
.unwrap_or_default(),
range: types::Range { start, end },
}
}
}
@ -111,13 +171,13 @@ fn utf8_column_offset(utf16_code_unit_offset: u32, line: &str) -> TextSize {
utf8_code_unit_offset
}
fn offset_to_position(
fn offset_to_source_location(
offset: TextSize,
text: &str,
index: &LineIndex,
encoding: PositionEncoding,
) -> types::Position {
let location = match encoding {
) -> SourceLocation {
match encoding {
PositionEncoding::UTF8 => {
let row = index.line_index(offset);
let column = offset - index.line_start(row, text);
@ -143,8 +203,10 @@ fn offset_to_position(
}
}
PositionEncoding::UTF32 => index.source_location(offset, text),
};
}
}
fn source_location_to_position(location: &SourceLocation) -> types::Position {
types::Position {
line: u32::try_from(location.row.to_zero_indexed()).expect("row usize fits in u32"),
character: u32::try_from(location.column.to_zero_indexed())