Extend ruff_dev formatter script to compute statistics and format a project (#5492)

## Summary This extends the `ruff_dev` formatter script util. Instead of only doing stability checks, you can now choose different compatible options on the CLI and get statistics. * It adds an option the formats all files that ruff would check to allow looking at an entire black-formatted repository with `git diff` * It computes the [Jaccard index](https://en.wikipedia.org/wiki/Jaccard_index) as a measure of deviation between input and output, which is useful as single number metric for assessing our current deviations from black. * It adds progress bars to both the single projects as well as the multi-project mode. * It adds an option to write the multi-project output to a file Sample usage: ``` $ cargo run --bin ruff_dev -- format-dev --stability-check crates/ruff/resources/test/cpython $ cargo run --bin ruff_dev -- format-dev --stability-check /home/konsti/projects/django Syntax error in /home/konsti/projects/django/tests/test_runner_apps/tagged/tests_syntax_error.py: source contains syntax errors (parser error): BaseError { error: UnrecognizedToken(Name { name: "syntax_error" }, None), offset: 131, source_path: "<filename>" } Found 0 stability errors in 2755 files (jaccard index 0.911) in 9.75s $ cargo run --bin ruff_dev -- format-dev --write /home/konsti/projects/django ``` Options: ``` Several utils related to the formatter which can be run on one or more repositories. The selected set of files in a repository is the same as for `ruff check`. * Check formatter stability: Format a repository twice and ensure that it looks that the first and second formatting look the same. * Format: Format the files in a repository to be able to check them with `git diff` * Statistics: The subcommand the Jaccard index between the (assumed to be black formatted) input and the ruff formatted output Usage: ruff_dev format-dev [OPTIONS] [FILES]... Arguments: [FILES]... Like `ruff check`'s files. See `--multi-project` if you want to format an ecosystem checkout Options: --stability-check Check stability We want to ensure that once formatted content stays the same when formatted again, which is known as formatter stability or formatter idempotency, and that the formatter prints syntactically valid code. As our test cases cover only a limited amount of code, this allows checking entire repositories. --write Format the files. Without this flag, the python files are not modified --format <FORMAT> Control the verbosity of the output [default: default] Possible values: - minimal: Filenames only - default: Filenames and reduced diff - full: Full diff and invalid code -x, --exit-first-error Print only the first error and exit, `-x` is same as pytest --multi-project Checks each project inside a directory, useful e.g. if you want to check all of the ecosystem checkouts --error-file <ERROR_FILE> Write all errors to this file in addition to stdout. Only used in multi-project mode ``` ## Test Plan I ran this on django (2755 files, jaccard index 0.911) and discovered a magic trailing comma problem and that we really needed to implement import formatting. I ran the script on cpython to identify https://github.com/astral-sh/ruff/pull/5558.
2025-07-22 04:25:11 +00:00 · 2023-07-07 13:30:12 +02:00 · 2023-07-07 13:30:12 +02:00 · b22e6c3d38
commit b22e6c3d38
parent 40ddc1604c
10 changed files with 726 additions and 499 deletions
--- a/crates/ruff_python_formatter/src/lib.rs
+++ b/crates/ruff_python_formatter/src/lib.rs
@ -3,17 +3,23 @@ use crate::comments::{
 };
 use crate::context::PyFormatContext;
 pub use crate::options::{MagicTrailingComma, PyFormatOptions, QuoteStyle};
-use anyhow::{anyhow, Context, Result};
-use ruff_formatter::prelude::*;
-use ruff_formatter::{format, write};
+use ruff_formatter::format_element::tag;
+use ruff_formatter::prelude::{
+    dynamic_text, source_position, source_text_slice, text, ContainsNewlines, Formatter, Tag,
+};
+use ruff_formatter::{
+    format, normalize_newlines, write, Buffer, Format, FormatElement, FormatError, FormatResult,
+    PrintError,
+};
 use ruff_formatter::{Formatted, Printed, SourceCode};
 use ruff_python_ast::node::{AnyNodeRef, AstNode, NodeKind};
 use ruff_python_ast::source_code::{CommentRanges, CommentRangesBuilder, Locator};
 use ruff_text_size::{TextLen, TextRange};
 use rustpython_parser::ast::{Mod, Ranged};
-use rustpython_parser::lexer::lex;
-use rustpython_parser::{parse_tokens, Mode};
+use rustpython_parser::lexer::{lex, LexicalError};
+use rustpython_parser::{parse_tokens, Mode, ParseError};
 use std::borrow::Cow;
+use thiserror::Error;

 pub(crate) mod builders;
 pub mod cli;
@ -84,16 +90,40 @@ where
    }
 }

-pub fn format_module(contents: &str, options: PyFormatOptions) -> Result<Printed> {
+#[derive(Error, Debug)]
+pub enum FormatModuleError {
+    #[error("source contains syntax errors (lexer error): {0:?}")]
+    LexError(LexicalError),
+    #[error("source contains syntax errors (parser error): {0:?}")]
+    ParseError(ParseError),
+    #[error(transparent)]
+    FormatError(#[from] FormatError),
+    #[error(transparent)]
+    PrintError(#[from] PrintError),
+}
+
+impl From<LexicalError> for FormatModuleError {
+    fn from(value: LexicalError) -> Self {
+        Self::LexError(value)
+    }
+}
+
+impl From<ParseError> for FormatModuleError {
+    fn from(value: ParseError) -> Self {
+        Self::ParseError(value)
+    }
+}
+
+pub fn format_module(
+    contents: &str,
+    options: PyFormatOptions,
+) -> Result<Printed, FormatModuleError> {
    // Tokenize once
    let mut tokens = Vec::new();
    let mut comment_ranges = CommentRangesBuilder::default();

    for result in lex(contents, Mode::Module) {
-        let (token, range) = match result {
-            Ok((token, range)) => (token, range),
-            Err(err) => return Err(anyhow!("Source contains syntax errors {err:?}")),
-        };
+        let (token, range) = result?;

        comment_ranges.visit_token(&token, range);
        tokens.push(Ok((token, range)));
@ -102,14 +132,11 @@ pub fn format_module(contents: &str, options: PyFormatOptions) -> Result<Printed
    let comment_ranges = comment_ranges.finish();

    // Parse the AST.
-    let python_ast = parse_tokens(tokens, Mode::Module, "<filename>")
-        .with_context(|| "Syntax error in input")?;
+    let python_ast = parse_tokens(tokens, Mode::Module, "<filename>")?;

    let formatted = format_node(&python_ast, &comment_ranges, contents, options)?;

-    formatted
-        .print()
-        .with_context(|| "Failed to print the formatter IR")
+    Ok(formatted.print()?)
 }

 pub fn format_node<'a>(