ruff server - A new built-in LSP for Ruff, written in Rust (#10158)

<!--
Thank you for contributing to Ruff! To help us out with reviewing,
please consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary

This PR introduces the `ruff_server` crate and a new `ruff server`
command. `ruff_server` is a re-implementation of
[`ruff-lsp`](https://github.com/astral-sh/ruff-lsp), written entirely in
Rust. It brings significant performance improvements, much tighter
integration with Ruff, a foundation for supporting entirely new language
server features, and more!

This PR is an early version of `ruff_lsp` that we're calling the
**pre-release** version. Anyone is more than welcome to use it and
submit bug reports for any issues they encounter - we'll have some
documentation on how to set it up with a few common editors, and we'll
also provide a pre-release VSCode extension for those interested.

This pre-release version supports:
- **Diagnostics for `.py` files**
- **Quick fixes**
- **Full-file formatting**
- **Range formatting**
- **Multiple workspace folders**
- **Automatic linter/formatter configuration** - taken from any
`pyproject.toml` files in the workspace.

Many thanks to @MichaReiser for his [proof-of-concept
work](https://github.com/astral-sh/ruff/pull/7262), which was important
groundwork for making this PR possible.

## Architectural Decisions

I've made an executive choice to go with `lsp-server` as a base
framework for the LSP, in favor of `tower-lsp`. There were several
reasons for this:

1. I would like to avoid `async` in our implementation. LSPs are mostly
computationally bound rather than I/O bound, and `async` adds a lot of
complexity to the API, while also making harder to reason about
execution order. This leads into the second reason, which is...
2. Any handlers that mutate state should be blocking and run in the
event loop, and the state should be lock-free. This is the approach that
`rust-analyzer` uses (also with the `lsp-server`/`lsp-types` crates as a
framework), and it gives us assurances about data mutation and execution
order. `tower-lsp` doesn't support this, which has caused some
[issues](https://github.com/ebkalderon/tower-lsp/issues/284) around data
races and out-of-order handler execution.
3. In general, I think it makes sense to have tight control over
scheduling and the specifics of our implementation, in exchange for a
slightly higher up-front cost of writing it ourselves. We'll be able to
fine-tune it to our needs and support future LSP features without
depending on an upstream maintainer.

## Test Plan

The pre-release of `ruff_server` will have snapshot tests for common
document editing scenarios. An expanded test suite is on the roadmap for
future version of `ruff_server`.
This commit is contained in:
Jane Lewis 2024-03-08 20:57:23 -08:00 committed by GitHub
parent a892fc755d
commit 0c84fbb6db
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
45 changed files with 5425 additions and 2 deletions

View file

@ -0,0 +1,123 @@
use lsp_types::TextDocumentContentChangeEvent;
use ruff_source_file::LineIndex;
use crate::PositionEncoding;
use super::RangeExt;
pub(crate) type DocumentVersion = i32;
/// The state for an individual document in the server. Stays up-to-date
/// with changes made by the user, including unsaved changes.
#[derive(Debug, Clone)]
pub struct Document {
/// The string contents of the document.
contents: String,
/// A computed line index for the document. This should always reflect
/// the current version of `contents`. Using a function like [`Self::modify`]
/// will re-calculate the line index automatically when the `contents` value is updated.
index: LineIndex,
/// The latest version of the document, set by the LSP client. The server will panic in
/// debug mode if we attempt to update the document with an 'older' version.
version: DocumentVersion,
}
impl Document {
pub fn new(contents: String, version: DocumentVersion) -> Self {
let index = LineIndex::from_source_text(&contents);
Self {
contents,
index,
version,
}
}
pub fn contents(&self) -> &str {
&self.contents
}
pub fn index(&self) -> &LineIndex {
&self.index
}
pub fn version(&self) -> DocumentVersion {
self.version
}
pub fn apply_changes(
&mut self,
changes: Vec<lsp_types::TextDocumentContentChangeEvent>,
new_version: DocumentVersion,
encoding: PositionEncoding,
) {
if let [lsp_types::TextDocumentContentChangeEvent {
range: None, text, ..
}] = changes.as_slice()
{
tracing::debug!("Fast path - replacing entire document");
self.modify(|contents, version| {
*contents = text.clone();
*version = new_version;
});
return;
}
let old_contents = self.contents().to_string();
let mut new_contents = self.contents().to_string();
let mut active_index = self.index().clone();
for TextDocumentContentChangeEvent {
range,
text: change,
..
} in changes
{
if let Some(range) = range {
let range = range.to_text_range(&new_contents, &active_index, encoding);
new_contents.replace_range(
usize::from(range.start())..usize::from(range.end()),
&change,
);
} else {
new_contents = change;
}
if new_contents != old_contents {
active_index = LineIndex::from_source_text(&new_contents);
}
}
self.modify_with_manual_index(|contents, version, index| {
if contents != &new_contents {
*index = active_index;
}
*contents = new_contents;
*version = new_version;
});
}
pub fn update_version(&mut self, new_version: DocumentVersion) {
self.modify_with_manual_index(|_, version, _| {
*version = new_version;
});
}
// A private function for modifying the document's internal state
fn modify(&mut self, func: impl FnOnce(&mut String, &mut DocumentVersion)) {
self.modify_with_manual_index(|c, v, i| {
func(c, v);
*i = LineIndex::from_source_text(c);
});
}
// A private function for overriding how we update the line index by default.
fn modify_with_manual_index(
&mut self,
func: impl FnOnce(&mut String, &mut DocumentVersion, &mut LineIndex),
) {
let old_version = self.version;
func(&mut self.contents, &mut self.version, &mut self.index);
debug_assert!(self.version >= old_version);
}
}

View file

@ -0,0 +1,153 @@
use super::PositionEncoding;
use lsp_types as types;
use ruff_source_file::OneIndexed;
use ruff_source_file::{LineIndex, SourceLocation};
use ruff_text_size::{TextRange, TextSize};
pub(crate) trait RangeExt {
fn to_text_range(&self, text: &str, index: &LineIndex, encoding: PositionEncoding)
-> TextRange;
}
pub(crate) trait ToRangeExt {
fn to_range(&self, text: &str, index: &LineIndex, encoding: PositionEncoding) -> types::Range;
}
fn u32_index_to_usize(index: u32) -> usize {
usize::try_from(index).expect("u32 fits in usize")
}
impl RangeExt for lsp_types::Range {
fn to_text_range(
&self,
text: &str,
index: &LineIndex,
encoding: PositionEncoding,
) -> TextRange {
let start_line = index.line_range(
OneIndexed::from_zero_indexed(u32_index_to_usize(self.start.line)),
text,
);
let end_line = index.line_range(
OneIndexed::from_zero_indexed(u32_index_to_usize(self.end.line)),
text,
);
let (start_column_offset, end_column_offset) = match encoding {
PositionEncoding::UTF8 => (
TextSize::new(self.start.character),
TextSize::new(self.end.character),
),
PositionEncoding::UTF16 => {
// Fast path for ASCII only documents
if index.is_ascii() {
(
TextSize::new(self.start.character),
TextSize::new(self.end.character),
)
} else {
// UTF16 encodes characters either as one or two 16 bit words.
// The position in `range` is the 16-bit word offset from the start of the line (and not the character offset)
// UTF-16 with a text that may use variable-length characters.
(
utf8_column_offset(self.start.character, &text[start_line]),
utf8_column_offset(self.end.character, &text[end_line]),
)
}
}
PositionEncoding::UTF32 => {
// UTF-32 uses 4 bytes for each character. Meaning, the position in range is a character offset.
return TextRange::new(
index.offset(
OneIndexed::from_zero_indexed(u32_index_to_usize(self.start.line)),
OneIndexed::from_zero_indexed(u32_index_to_usize(self.start.character)),
text,
),
index.offset(
OneIndexed::from_zero_indexed(u32_index_to_usize(self.end.line)),
OneIndexed::from_zero_indexed(u32_index_to_usize(self.end.character)),
text,
),
);
}
};
TextRange::new(
start_line.start() + start_column_offset.clamp(TextSize::new(0), start_line.end()),
end_line.start() + end_column_offset.clamp(TextSize::new(0), end_line.end()),
)
}
}
impl ToRangeExt for TextRange {
fn to_range(&self, text: &str, index: &LineIndex, encoding: PositionEncoding) -> types::Range {
types::Range {
start: offset_to_position(self.start(), text, index, encoding),
end: offset_to_position(self.end(), text, index, encoding),
}
}
}
/// Converts a UTF-16 code unit offset for a given line into a UTF-8 column number.
fn utf8_column_offset(utf16_code_unit_offset: u32, line: &str) -> TextSize {
let mut utf8_code_unit_offset = TextSize::new(0);
let mut i = 0u32;
for c in line.chars() {
if i >= utf16_code_unit_offset {
break;
}
// Count characters encoded as two 16 bit words as 2 characters.
{
utf8_code_unit_offset +=
TextSize::new(u32::try_from(c.len_utf8()).expect("utf8 len always <=4"));
i += u32::try_from(c.len_utf16()).expect("utf16 len always <=2");
}
}
utf8_code_unit_offset
}
fn offset_to_position(
offset: TextSize,
text: &str,
index: &LineIndex,
encoding: PositionEncoding,
) -> types::Position {
let location = match encoding {
PositionEncoding::UTF8 => {
let row = index.line_index(offset);
let column = offset - index.line_start(row, text);
SourceLocation {
column: OneIndexed::from_zero_indexed(column.to_usize()),
row,
}
}
PositionEncoding::UTF16 => {
let row = index.line_index(offset);
let column = if index.is_ascii() {
(offset - index.line_start(row, text)).to_usize()
} else {
let up_to_line = &text[TextRange::new(index.line_start(row, text), offset)];
up_to_line.encode_utf16().count()
};
SourceLocation {
column: OneIndexed::from_zero_indexed(column),
row,
}
}
PositionEncoding::UTF32 => index.source_location(offset, text),
};
types::Position {
line: u32::try_from(location.row.to_zero_indexed()).expect("row usize fits in u32"),
character: u32::try_from(location.column.to_zero_indexed())
.expect("character usize fits in u32"),
}
}