ruff server - A new built-in LSP for Ruff, written in Rust (#10158)

## Summary This PR introduces the `ruff_server` crate and a new `ruff server` command. `ruff_server` is a re-implementation of [`ruff-lsp`](https://github.com/astral-sh/ruff-lsp), written entirely in Rust. It brings significant performance improvements, much tighter integration with Ruff, a foundation for supporting entirely new language server features, and more! This PR is an early version of `ruff_lsp` that we're calling the **pre-release** version. Anyone is more than welcome to use it and submit bug reports for any issues they encounter - we'll have some documentation on how to set it up with a few common editors, and we'll also provide a pre-release VSCode extension for those interested. This pre-release version supports: - **Diagnostics for `.py` files** - **Quick fixes** - **Full-file formatting** - **Range formatting** - **Multiple workspace folders** - **Automatic linter/formatter configuration** - taken from any `pyproject.toml` files in the workspace. Many thanks to @MichaReiser for his [proof-of-concept work](https://github.com/astral-sh/ruff/pull/7262), which was important groundwork for making this PR possible. ## Architectural Decisions I've made an executive choice to go with `lsp-server` as a base framework for the LSP, in favor of `tower-lsp`. There were several reasons for this: 1. I would like to avoid `async` in our implementation. LSPs are mostly computationally bound rather than I/O bound, and `async` adds a lot of complexity to the API, while also making harder to reason about execution order. This leads into the second reason, which is... 2. Any handlers that mutate state should be blocking and run in the event loop, and the state should be lock-free. This is the approach that `rust-analyzer` uses (also with the `lsp-server`/`lsp-types` crates as a framework), and it gives us assurances about data mutation and execution order. `tower-lsp` doesn't support this, which has caused some [issues](https://github.com/ebkalderon/tower-lsp/issues/284) around data races and out-of-order handler execution. 3. In general, I think it makes sense to have tight control over scheduling and the specifics of our implementation, in exchange for a slightly higher up-front cost of writing it ourselves. We'll be able to fine-tune it to our needs and support future LSP features without depending on an upstream maintainer. ## Test Plan The pre-release of `ruff_server` will have snapshot tests for common document editing scenarios. An expanded test suite is on the roadmap for future version of `ruff_server`.
2025-09-30 13:51:37 +00:00 · 2024-03-08 20:57:23 -08:00 · 2024-03-08 20:57:23 -08:00 · 0c84fbb6db
commit 0c84fbb6db
parent a892fc755d
45 changed files with 5425 additions and 2 deletions
--- a/crates/ruff_server/src/edit/document.rs
+++ b/crates/ruff_server/src/edit/document.rs
@ -0,0 +1,123 @@
+use lsp_types::TextDocumentContentChangeEvent;
+use ruff_source_file::LineIndex;
+
+use crate::PositionEncoding;
+
+use super::RangeExt;
+
+pub(crate) type DocumentVersion = i32;
+
+/// The state for an individual document in the server. Stays up-to-date
+/// with changes made by the user, including unsaved changes.
+#[derive(Debug, Clone)]
+pub struct Document {
+    /// The string contents of the document.
+    contents: String,
+    /// A computed line index for the document. This should always reflect
+    /// the current version of `contents`. Using a function like [`Self::modify`]
+    /// will re-calculate the line index automatically when the `contents` value is updated.
+    index: LineIndex,
+    /// The latest version of the document, set by the LSP client. The server will panic in
+    /// debug mode if we attempt to update the document with an 'older' version.
+    version: DocumentVersion,
+}
+
+impl Document {
+    pub fn new(contents: String, version: DocumentVersion) -> Self {
+        let index = LineIndex::from_source_text(&contents);
+        Self {
+            contents,
+            index,
+            version,
+        }
+    }
+
+    pub fn contents(&self) -> &str {
+        &self.contents
+    }
+
+    pub fn index(&self) -> &LineIndex {
+        &self.index
+    }
+
+    pub fn version(&self) -> DocumentVersion {
+        self.version
+    }
+
+    pub fn apply_changes(
+        &mut self,
+        changes: Vec<lsp_types::TextDocumentContentChangeEvent>,
+        new_version: DocumentVersion,
+        encoding: PositionEncoding,
+    ) {
+        if let [lsp_types::TextDocumentContentChangeEvent {
+            range: None, text, ..
+        }] = changes.as_slice()
+        {
+            tracing::debug!("Fast path - replacing entire document");
+            self.modify(|contents, version| {
+                *contents = text.clone();
+                *version = new_version;
+            });
+            return;
+        }
+
+        let old_contents = self.contents().to_string();
+        let mut new_contents = self.contents().to_string();
+        let mut active_index = self.index().clone();
+
+        for TextDocumentContentChangeEvent {
+            range,
+            text: change,
+            ..
+        } in changes
+        {
+            if let Some(range) = range {
+                let range = range.to_text_range(&new_contents, &active_index, encoding);
+
+                new_contents.replace_range(
+                    usize::from(range.start())..usize::from(range.end()),
+                    &change,
+                );
+            } else {
+                new_contents = change;
+            }
+
+            if new_contents != old_contents {
+                active_index = LineIndex::from_source_text(&new_contents);
+            }
+        }
+
+        self.modify_with_manual_index(|contents, version, index| {
+            if contents != &new_contents {
+                *index = active_index;
+            }
+            *contents = new_contents;
+            *version = new_version;
+        });
+    }
+
+    pub fn update_version(&mut self, new_version: DocumentVersion) {
+        self.modify_with_manual_index(|_, version, _| {
+            *version = new_version;
+        });
+    }
+
+    // A private function for modifying the document's internal state
+    fn modify(&mut self, func: impl FnOnce(&mut String, &mut DocumentVersion)) {
+        self.modify_with_manual_index(|c, v, i| {
+            func(c, v);
+            *i = LineIndex::from_source_text(c);
+        });
+    }
+
+    // A private function for overriding how we update the line index by default.
+    fn modify_with_manual_index(
+        &mut self,
+        func: impl FnOnce(&mut String, &mut DocumentVersion, &mut LineIndex),
+    ) {
+        let old_version = self.version;
+        func(&mut self.contents, &mut self.version, &mut self.index);
+        debug_assert!(self.version >= old_version);
+    }
+}
--- a/crates/ruff_server/src/edit/range.rs
+++ b/crates/ruff_server/src/edit/range.rs
@ -0,0 +1,153 @@
+use super::PositionEncoding;
+use lsp_types as types;
+use ruff_source_file::OneIndexed;
+use ruff_source_file::{LineIndex, SourceLocation};
+use ruff_text_size::{TextRange, TextSize};
+
+pub(crate) trait RangeExt {
+    fn to_text_range(&self, text: &str, index: &LineIndex, encoding: PositionEncoding)
+        -> TextRange;
+}
+
+pub(crate) trait ToRangeExt {
+    fn to_range(&self, text: &str, index: &LineIndex, encoding: PositionEncoding) -> types::Range;
+}
+
+fn u32_index_to_usize(index: u32) -> usize {
+    usize::try_from(index).expect("u32 fits in usize")
+}
+
+impl RangeExt for lsp_types::Range {
+    fn to_text_range(
+        &self,
+        text: &str,
+        index: &LineIndex,
+        encoding: PositionEncoding,
+    ) -> TextRange {
+        let start_line = index.line_range(
+            OneIndexed::from_zero_indexed(u32_index_to_usize(self.start.line)),
+            text,
+        );
+        let end_line = index.line_range(
+            OneIndexed::from_zero_indexed(u32_index_to_usize(self.end.line)),
+            text,
+        );
+
+        let (start_column_offset, end_column_offset) = match encoding {
+            PositionEncoding::UTF8 => (
+                TextSize::new(self.start.character),
+                TextSize::new(self.end.character),
+            ),
+
+            PositionEncoding::UTF16 => {
+                // Fast path for ASCII only documents
+                if index.is_ascii() {
+                    (
+                        TextSize::new(self.start.character),
+                        TextSize::new(self.end.character),
+                    )
+                } else {
+                    // UTF16 encodes characters either as one or two 16 bit words.
+                    // The position in `range` is the 16-bit word offset from the start of the line (and not the character offset)
+                    // UTF-16 with a text that may use variable-length characters.
+                    (
+                        utf8_column_offset(self.start.character, &text[start_line]),
+                        utf8_column_offset(self.end.character, &text[end_line]),
+                    )
+                }
+            }
+            PositionEncoding::UTF32 => {
+                // UTF-32 uses 4 bytes for each character. Meaning, the position in range is a character offset.
+                return TextRange::new(
+                    index.offset(
+                        OneIndexed::from_zero_indexed(u32_index_to_usize(self.start.line)),
+                        OneIndexed::from_zero_indexed(u32_index_to_usize(self.start.character)),
+                        text,
+                    ),
+                    index.offset(
+                        OneIndexed::from_zero_indexed(u32_index_to_usize(self.end.line)),
+                        OneIndexed::from_zero_indexed(u32_index_to_usize(self.end.character)),
+                        text,
+                    ),
+                );
+            }
+        };
+
+        TextRange::new(
+            start_line.start() + start_column_offset.clamp(TextSize::new(0), start_line.end()),
+            end_line.start() + end_column_offset.clamp(TextSize::new(0), end_line.end()),
+        )
+    }
+}
+
+impl ToRangeExt for TextRange {
+    fn to_range(&self, text: &str, index: &LineIndex, encoding: PositionEncoding) -> types::Range {
+        types::Range {
+            start: offset_to_position(self.start(), text, index, encoding),
+            end: offset_to_position(self.end(), text, index, encoding),
+        }
+    }
+}
+
+/// Converts a UTF-16 code unit offset for a given line into a UTF-8 column number.
+fn utf8_column_offset(utf16_code_unit_offset: u32, line: &str) -> TextSize {
+    let mut utf8_code_unit_offset = TextSize::new(0);
+
+    let mut i = 0u32;
+
+    for c in line.chars() {
+        if i >= utf16_code_unit_offset {
+            break;
+        }
+
+        // Count characters encoded as two 16 bit words as 2 characters.
+        {
+            utf8_code_unit_offset +=
+                TextSize::new(u32::try_from(c.len_utf8()).expect("utf8 len always <=4"));
+            i += u32::try_from(c.len_utf16()).expect("utf16 len always <=2");
+        }
+    }
+
+    utf8_code_unit_offset
+}
+
+fn offset_to_position(
+    offset: TextSize,
+    text: &str,
+    index: &LineIndex,
+    encoding: PositionEncoding,
+) -> types::Position {
+    let location = match encoding {
+        PositionEncoding::UTF8 => {
+            let row = index.line_index(offset);
+            let column = offset - index.line_start(row, text);
+
+            SourceLocation {
+                column: OneIndexed::from_zero_indexed(column.to_usize()),
+                row,
+            }
+        }
+        PositionEncoding::UTF16 => {
+            let row = index.line_index(offset);
+
+            let column = if index.is_ascii() {
+                (offset - index.line_start(row, text)).to_usize()
+            } else {
+                let up_to_line = &text[TextRange::new(index.line_start(row, text), offset)];
+                up_to_line.encode_utf16().count()
+            };
+
+            SourceLocation {
+                column: OneIndexed::from_zero_indexed(column),
+                row,
+            }
+        }
+        PositionEncoding::UTF32 => index.source_location(offset, text),
+    };
+
+    types::Position {
+        line: u32::try_from(location.row.to_zero_indexed()).expect("row usize fits in u32"),
+        character: u32::try_from(location.column.to_zero_indexed())
+            .expect("character usize fits in u32"),
+    }
+}