ruff server: Support Jupyter Notebook (*.ipynb) files (#11206)

## Summary Closes https://github.com/astral-sh/ruff/issues/10858. `ruff server` now supports `*.ipynb` (aka Jupyter Notebook) files. Extensive internal changes have been made to facilitate this, which I've done some work to contextualize with documentation and an pre-review that highlights notable sections of the code. `*.ipynb` cells should behave similarly to `*.py` documents, with one major exception. The format command `ruff.applyFormat` will only apply to the currently selected notebook cell - if you want to format an entire notebook document, use `Format Notebook` from the VS Code context menu. ## Test Plan The VS Code extension does not yet have Jupyter Notebook support enabled, so you'll first need to enable it manually. To do this, checkout the `pre-release` branch and modify `src/common/server.ts` as follows: Before: ![Screenshot 2024-05-13 at 10 59 06 PM](c6a3c604-c405-4968-b8a2-5d670de89172) After: ![Screenshot 2024-05-13 at 10 58 24 PM](94ab2e3d-0609-448d-9c8c-cd07c69a513b) I recommend testing this PR with large, complicated notebook files. I used notebook files from [this popular repository](https://github.com/jakevdp/PythonDataScienceHandbook/tree/master/notebooks) in my preliminary testing. The main thing to test is ensuring that notebook cells behave the same as Python documents, besides the aforementioned issue with `ruff.applyFormat`. You should also test adding and deleting cells (in particular, deleting all the code cells and ensure that doesn't break anything), changing the kind of a cell (i.e. from markup -> code or vice versa), and creating a new notebook file from scratch. Finally, you should also test that source actions work as expected (and across the entire notebook). Note: `ruff.applyAutofix` and `ruff.applyOrganizeImports` are currently broken for notebook files, and I suspect it has something to do with https://github.com/astral-sh/ruff/issues/11248. Once this is fixed, I will update the test plan accordingly. --------- Co-authored-by: nolan <nolan.king90@gmail.com>
2025-09-30 22:01:47 +00:00 · 2024-05-21 15:29:30 -07:00 · 2024-05-21 15:29:30 -07:00 · b0731ef9cb
commit b0731ef9cb
parent 84531d1644
39 changed files with 1584 additions and 622 deletions
--- a/crates/ruff_server/src/edit/document.rs
+++ b/crates/ruff_server/src/edit/document.rs
@ -7,10 +7,10 @@ use super::RangeExt;

 pub(crate) type DocumentVersion = i32;

-/// The state for an individual document in the server. Stays up-to-date
+/// The state of an individual document in the server. Stays up-to-date
 /// with changes made by the user, including unsaved changes.
 #[derive(Debug, Clone)]
-pub struct Document {
+pub struct TextDocument {
    /// The string contents of the document.
    contents: String,
    /// A computed line index for the document. This should always reflect
@ -22,7 +22,7 @@ pub struct Document {
    version: DocumentVersion,
 }

-impl Document {
+impl TextDocument {
    pub fn new(contents: String, version: DocumentVersion) -> Self {
        let index = LineIndex::from_source_text(&contents);
        Self {
--- a/crates/ruff_server/src/edit/notebook.rs
+++ b/crates/ruff_server/src/edit/notebook.rs
@ -0,0 +1,202 @@
+use std::{collections::HashMap, hash::BuildHasherDefault};
+
+use anyhow::Ok;
+use lsp_types::{NotebookCellKind, Url};
+use rustc_hash::FxHashMap;
+
+use crate::{PositionEncoding, TextDocument};
+
+use super::DocumentVersion;
+
+pub(super) type CellId = usize;
+
+/// The state of a notebook document in the server. Contains an array of cells whose
+/// contents are internally represented by [`TextDocument`]s.
+#[derive(Clone, Debug)]
+pub(crate) struct NotebookDocument {
+    cells: Vec<NotebookCell>,
+    metadata: ruff_notebook::RawNotebookMetadata,
+    version: DocumentVersion,
+    // Used to quickly find the index of a cell for a given URL.
+    cell_index: FxHashMap<lsp_types::Url, CellId>,
+}
+
+/// A single cell within a notebook, which has text contents represented as a `TextDocument`.
+#[derive(Clone, Debug)]
+struct NotebookCell {
+    url: Url,
+    kind: NotebookCellKind,
+    document: TextDocument,
+}
+
+impl NotebookDocument {
+    pub(crate) fn new(
+        version: DocumentVersion,
+        cells: Vec<lsp_types::NotebookCell>,
+        metadata: serde_json::Map<String, serde_json::Value>,
+        cell_documents: Vec<lsp_types::TextDocumentItem>,
+    ) -> crate::Result<Self> {
+        let mut cell_contents: FxHashMap<_, _> = cell_documents
+            .into_iter()
+            .map(|document| (document.uri, document.text))
+            .collect();
+
+        let cells: Vec<_> = cells
+            .into_iter()
+            .map(|cell| {
+                let contents = cell_contents.remove(&cell.document).unwrap_or_default();
+                NotebookCell::new(cell, contents, version)
+            })
+            .collect();
+
+        Ok(Self {
+            version,
+            cell_index: Self::make_cell_index(cells.as_slice()),
+            metadata: serde_json::from_value(serde_json::Value::Object(metadata))?,
+            cells,
+        })
+    }
+
+    /// Generates a pseudo-representation of a notebook that lacks per-cell metadata and contextual information
+    /// but should still work with Ruff's linter.
+    pub(crate) fn make_ruff_notebook(&self) -> ruff_notebook::Notebook {
+        let cells = self
+            .cells
+            .iter()
+            .map(|cell| match cell.kind {
+                NotebookCellKind::Code => ruff_notebook::Cell::Code(ruff_notebook::CodeCell {
+                    execution_count: None,
+                    id: None,
+                    metadata: serde_json::Value::Null,
+                    outputs: vec![],
+                    source: ruff_notebook::SourceValue::String(
+                        cell.document.contents().to_string(),
+                    ),
+                }),
+                NotebookCellKind::Markup => {
+                    ruff_notebook::Cell::Markdown(ruff_notebook::MarkdownCell {
+                        attachments: None,
+                        id: None,
+                        metadata: serde_json::Value::Null,
+                        source: ruff_notebook::SourceValue::String(
+                            cell.document.contents().to_string(),
+                        ),
+                    })
+                }
+            })
+            .collect();
+        let raw_notebook = ruff_notebook::RawNotebook {
+            cells,
+            metadata: self.metadata.clone(),
+            nbformat: 4,
+            nbformat_minor: 5,
+        };
+
+        ruff_notebook::Notebook::from_raw_notebook(raw_notebook, false)
+            .unwrap_or_else(|err| panic!("Server notebook document could not be converted to Ruff's notebook document format: {err}"))
+    }
+
+    pub(crate) fn update(
+        &mut self,
+        cells: Option<lsp_types::NotebookDocumentCellChange>,
+        metadata_change: Option<serde_json::Map<String, serde_json::Value>>,
+        version: DocumentVersion,
+        encoding: PositionEncoding,
+    ) -> crate::Result<()> {
+        self.version = version;
+
+        if let Some(lsp_types::NotebookDocumentCellChange {
+            structure,
+            data,
+            text_content,
+        }) = cells
+        {
+            if let Some(structure) = structure {
+                let start = structure.array.start as usize;
+                let delete = structure.array.delete_count as usize;
+                if delete > 0 {
+                    for cell in self.cells.drain(start..start + delete) {
+                        self.cell_index.remove(&cell.url);
+                    }
+                }
+                for cell in structure.array.cells.into_iter().flatten().rev() {
+                    self.cells
+                        .insert(start, NotebookCell::new(cell, String::new(), version));
+                }
+
+                // register any new cells in the index and update existing ones that came after the insertion
+                for (i, cell) in self.cells.iter().enumerate().skip(start) {
+                    self.cell_index.insert(cell.url.clone(), i);
+                }
+            }
+            if let Some(cell_data) = data {
+                for cell in cell_data {
+                    if let Some(existing_cell) = self.cell_by_uri_mut(&cell.document) {
+                        existing_cell.kind = cell.kind;
+                    }
+                }
+            }
+            if let Some(content_changes) = text_content {
+                for content_change in content_changes {
+                    if let Some(cell) = self.cell_by_uri_mut(&content_change.document.uri) {
+                        cell.document
+                            .apply_changes(content_change.changes, version, encoding);
+                    }
+                }
+            }
+        }
+        if let Some(metadata_change) = metadata_change {
+            self.metadata = serde_json::from_value(serde_json::Value::Object(metadata_change))?;
+        }
+        Ok(())
+    }
+
+    /// Get the current version of the notebook document.
+    pub(crate) fn version(&self) -> DocumentVersion {
+        self.version
+    }
+
+    /// Get the URI for a cell by its index within the cell array.
+    pub(crate) fn cell_uri_by_index(&self, index: CellId) -> Option<&lsp_types::Url> {
+        self.cells.get(index).map(|cell| &cell.url)
+    }
+
+    /// Get the text document representing the contents of a cell by the cell URI.
+    pub(crate) fn cell_document_by_uri(&self, uri: &lsp_types::Url) -> Option<&TextDocument> {
+        self.cells
+            .get(*self.cell_index.get(uri)?)
+            .map(|cell| &cell.document)
+    }
+
+    /// Returns a list of cell URIs in the order they appear in the array.
+    pub(crate) fn urls(&self) -> impl Iterator<Item = &lsp_types::Url> {
+        self.cells.iter().map(|cell| &cell.url)
+    }
+
+    fn cell_by_uri_mut(&mut self, uri: &lsp_types::Url) -> Option<&mut NotebookCell> {
+        self.cells.get_mut(*self.cell_index.get(uri)?)
+    }
+
+    fn make_cell_index(cells: &[NotebookCell]) -> FxHashMap<lsp_types::Url, CellId> {
+        let mut index =
+            HashMap::with_capacity_and_hasher(cells.len(), BuildHasherDefault::default());
+        for (i, cell) in cells.iter().enumerate() {
+            index.insert(cell.url.clone(), i);
+        }
+        index
+    }
+}
+
+impl NotebookCell {
+    pub(crate) fn new(
+        cell: lsp_types::NotebookCell,
+        contents: String,
+        version: DocumentVersion,
+    ) -> Self {
+        Self {
+            url: cell.document,
+            kind: cell.kind,
+            document: TextDocument::new(contents, version),
+        }
+    }
+}
--- a/crates/ruff_server/src/edit/range.rs
+++ b/crates/ruff_server/src/edit/range.rs
@ -1,9 +1,16 @@
+use super::notebook;
 use super::PositionEncoding;
 use lsp_types as types;
+use ruff_notebook::NotebookIndex;
 use ruff_source_file::OneIndexed;
 use ruff_source_file::{LineIndex, SourceLocation};
 use ruff_text_size::{TextRange, TextSize};

+pub(crate) struct NotebookRange {
+    pub(crate) cell: notebook::CellId,
+    pub(crate) range: types::Range,
+}
+
 pub(crate) trait RangeExt {
    fn to_text_range(&self, text: &str, index: &LineIndex, encoding: PositionEncoding)
        -> TextRange;
@ -11,6 +18,13 @@ pub(crate) trait RangeExt {

 pub(crate) trait ToRangeExt {
    fn to_range(&self, text: &str, index: &LineIndex, encoding: PositionEncoding) -> types::Range;
+    fn to_notebook_range(
+        &self,
+        text: &str,
+        source_index: &LineIndex,
+        notebook_index: &NotebookIndex,
+        encoding: PositionEncoding,
+    ) -> NotebookRange;
 }

 fn u32_index_to_usize(index: u32) -> usize {
@ -83,8 +97,54 @@ impl RangeExt for lsp_types::Range {
 impl ToRangeExt for TextRange {
    fn to_range(&self, text: &str, index: &LineIndex, encoding: PositionEncoding) -> types::Range {
        types::Range {
-            start: offset_to_position(self.start(), text, index, encoding),
-            end: offset_to_position(self.end(), text, index, encoding),
+            start: source_location_to_position(&offset_to_source_location(
+                self.start(),
+                text,
+                index,
+                encoding,
+            )),
+            end: source_location_to_position(&offset_to_source_location(
+                self.end(),
+                text,
+                index,
+                encoding,
+            )),
+        }
+    }
+
+    fn to_notebook_range(
+        &self,
+        text: &str,
+        source_index: &LineIndex,
+        notebook_index: &NotebookIndex,
+        encoding: PositionEncoding,
+    ) -> NotebookRange {
+        let start = offset_to_source_location(self.start(), text, source_index, encoding);
+        let mut end = offset_to_source_location(self.end(), text, source_index, encoding);
+        let starting_cell = notebook_index.cell(start.row);
+
+        // weird edge case here - if the end of the range is where the newline after the cell got added (making it 'out of bounds')
+        // we need to move it one character back (which should place it at the end of the last line).
+        // we test this by checking if the ending offset is in a different (or nonexistent) cell compared to the cell of the starting offset.
+        if notebook_index.cell(end.row) != starting_cell {
+            end.row = end.row.saturating_sub(1);
+            end.column = offset_to_source_location(
+                self.end().checked_sub(1.into()).unwrap_or_default(),
+                text,
+                source_index,
+                encoding,
+            )
+            .column;
+        }
+
+        let start = source_location_to_position(&notebook_index.translate_location(&start));
+        let end = source_location_to_position(&notebook_index.translate_location(&end));
+
+        NotebookRange {
+            cell: starting_cell
+                .map(OneIndexed::to_zero_indexed)
+                .unwrap_or_default(),
+            range: types::Range { start, end },
        }
    }
 }
@ -111,13 +171,13 @@ fn utf8_column_offset(utf16_code_unit_offset: u32, line: &str) -> TextSize {
    utf8_code_unit_offset
 }

-fn offset_to_position(
+fn offset_to_source_location(
    offset: TextSize,
    text: &str,
    index: &LineIndex,
    encoding: PositionEncoding,
-) -> types::Position {
-    let location = match encoding {
+) -> SourceLocation {
+    match encoding {
        PositionEncoding::UTF8 => {
            let row = index.line_index(offset);
            let column = offset - index.line_start(row, text);
@ -143,8 +203,10 @@ fn offset_to_position(
            }
        }
        PositionEncoding::UTF32 => index.source_location(offset, text),
-    };
+    }
+}

+fn source_location_to_position(location: &SourceLocation) -> types::Position {
    types::Position {
        line: u32::try_from(location.row.to_zero_indexed()).expect("row usize fits in u32"),
        character: u32::try_from(location.column.to_zero_indexed())