mirror of
https://github.com/astral-sh/ruff.git
synced 2025-09-28 12:55:05 +00:00

# Summary This PR contains the code for the autoformatter proof-of-concept. ## Crate structure The primary formatting hook is the `fmt` function in `crates/ruff_python_formatter/src/lib.rs`. The current formatter approach is outlined in `crates/ruff_python_formatter/src/lib.rs`, and is structured as follows: - Tokenize the code using the RustPython lexer. - In `crates/ruff_python_formatter/src/trivia.rs`, extract a variety of trivia tokens from the token stream. These include comments, trailing commas, and empty lines. - Generate the AST via the RustPython parser. - In `crates/ruff_python_formatter/src/cst.rs`, convert the AST to a CST structure. As of now, the CST is nearly identical to the AST, except that every node gets a `trivia` vector. But we might want to modify it further. - In `crates/ruff_python_formatter/src/attachment.rs`, attach each trivia token to the corresponding CST node. The logic for this is mostly in `decorate_trivia` and is ported almost directly from Prettier (given each token, find its preceding, following, and enclosing nodes, then attach the token to the appropriate node in a second pass). - In `crates/ruff_python_formatter/src/newlines.rs`, normalize newlines to match Black’s preferences. This involves traversing the CST and inserting or removing `TriviaToken` values as we go. - Call `format!` on the CST, which delegates to type-specific formatter implementations (e.g., `crates/ruff_python_formatter/src/format/stmt.rs` for `Stmt` nodes, and similar for `Expr` nodes; the others are trivial). Those type-specific implementations delegate to kind-specific functions (e.g., `format_func_def`). ## Testing and iteration The formatter is being developed against the Black test suite, which was copied over in-full to `crates/ruff_python_formatter/resources/test/fixtures/black`. The Black fixtures had to be modified to create `[insta](https://github.com/mitsuhiko/insta)`-compatible snapshots, which now exist in the repo. My approach thus far has been to try and improve coverage by tackling fixtures one-by-one. ## What works, and what doesn’t - *Most* nodes are supported at a basic level (though there are a few stragglers at time of writing, like `StmtKind::Try`). - Newlines are properly preserved in most cases. - Magic trailing commas are properly preserved in some (but not all) cases. - Trivial leading and trailing standalone comments mostly work (although maybe not at the end of a file). - Inline comments, and comments within expressions, often don’t work -- they work in a few cases, but it’s one-off right now. (We’re probably associating them with the “right” nodes more often than we are actually rendering them in the right place.) - We don’t properly normalize string quotes. (At present, we just repeat any constants verbatim.) - We’re mishandling a bunch of wrapping cases (if we treat Black as the reference implementation). Here are a few examples (demonstrating Black's stable behavior): ```py # In some cases, if the end expression is "self-closing" (functions, # lists, dictionaries, sets, subscript accesses, and any length-two # boolean operations that end in these elments), Black # will wrap like this... if some_expression and f( b, c, d, ): pass # ...whereas we do this: if ( some_expression and f( b, c, d, ) ): pass # If function arguments can fit on a single line, then Black will # format them like this, rather than exploding them vertically. if f( a, b, c, d, e, f, g, ... ): pass ``` - We don’t properly preserve parentheses in all cases. Black preserves parentheses in some but not all cases.
169 lines
6.5 KiB
Rust
169 lines
6.5 KiB
Rust
use crate::core::visitor;
|
|
use crate::core::visitor::Visitor;
|
|
use crate::cst::{Expr, ExprKind, Stmt, StmtKind};
|
|
use crate::trivia::{Parenthesize, TriviaKind};
|
|
|
|
/// Modify an [`Expr`] to infer parentheses, rather than respecting any user-provided trivia.
|
|
fn use_inferred_parens(expr: &mut Expr) {
|
|
// Remove parentheses, unless it's a generator expression, in which case, keep them.
|
|
if !matches!(expr.node, ExprKind::GeneratorExp { .. }) {
|
|
expr.trivia
|
|
.retain(|trivia| !matches!(trivia.kind, TriviaKind::Parentheses));
|
|
}
|
|
|
|
// If it's a tuple, add parentheses if it's a singleton; otherwise, we only need parentheses
|
|
// if the tuple expands.
|
|
if let ExprKind::Tuple { elts, .. } = &expr.node {
|
|
expr.parentheses = if elts.len() > 1 {
|
|
Parenthesize::IfExpanded
|
|
} else {
|
|
Parenthesize::Always
|
|
};
|
|
}
|
|
}
|
|
|
|
struct ParenthesesNormalizer {}
|
|
|
|
impl<'a> Visitor<'a> for ParenthesesNormalizer {
|
|
fn visit_stmt(&mut self, stmt: &'a mut Stmt) {
|
|
// Always remove parentheses around statements, unless it's an expression statement,
|
|
// in which case, remove parentheses around the expression.
|
|
let before = stmt.trivia.len();
|
|
stmt.trivia
|
|
.retain(|trivia| !matches!(trivia.kind, TriviaKind::Parentheses));
|
|
let after = stmt.trivia.len();
|
|
if let StmtKind::Expr { value } = &mut stmt.node {
|
|
if before != after {
|
|
stmt.parentheses = Parenthesize::Always;
|
|
value.parentheses = Parenthesize::Never;
|
|
}
|
|
}
|
|
|
|
// In a variety of contexts, remove parentheses around sub-expressions. Right now, the
|
|
// pattern is consistent (and repeated), but it may not end up that way.
|
|
// https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#parentheses
|
|
match &mut stmt.node {
|
|
StmtKind::FunctionDef { .. } => {}
|
|
StmtKind::AsyncFunctionDef { .. } => {}
|
|
StmtKind::ClassDef { .. } => {}
|
|
StmtKind::Return { value } => {
|
|
if let Some(value) = value {
|
|
use_inferred_parens(value);
|
|
}
|
|
}
|
|
StmtKind::Delete { .. } => {}
|
|
StmtKind::Assign { targets, value, .. } => {
|
|
for target in targets {
|
|
use_inferred_parens(target);
|
|
}
|
|
use_inferred_parens(value);
|
|
}
|
|
StmtKind::AugAssign { value, .. } => {
|
|
use_inferred_parens(value);
|
|
}
|
|
StmtKind::AnnAssign { value, .. } => {
|
|
if let Some(value) = value {
|
|
use_inferred_parens(value);
|
|
}
|
|
}
|
|
StmtKind::For { target, iter, .. } | StmtKind::AsyncFor { target, iter, .. } => {
|
|
use_inferred_parens(target);
|
|
use_inferred_parens(iter);
|
|
}
|
|
StmtKind::While { test, .. } => {
|
|
use_inferred_parens(test);
|
|
}
|
|
StmtKind::If { test, .. } => {
|
|
use_inferred_parens(test);
|
|
}
|
|
StmtKind::With { .. } => {}
|
|
StmtKind::AsyncWith { .. } => {}
|
|
StmtKind::Match { .. } => {}
|
|
StmtKind::Raise { .. } => {}
|
|
StmtKind::Try { .. } => {}
|
|
StmtKind::Assert { test, msg } => {
|
|
use_inferred_parens(test);
|
|
if let Some(msg) = msg {
|
|
use_inferred_parens(msg);
|
|
}
|
|
}
|
|
StmtKind::Import { .. } => {}
|
|
StmtKind::ImportFrom { .. } => {}
|
|
StmtKind::Global { .. } => {}
|
|
StmtKind::Nonlocal { .. } => {}
|
|
StmtKind::Expr { .. } => {}
|
|
StmtKind::Pass => {}
|
|
StmtKind::Break => {}
|
|
StmtKind::Continue => {}
|
|
}
|
|
|
|
visitor::walk_stmt(self, stmt);
|
|
}
|
|
|
|
fn visit_expr(&mut self, expr: &'a mut Expr) {
|
|
// Always retain parentheses around expressions.
|
|
let before = expr.trivia.len();
|
|
expr.trivia
|
|
.retain(|trivia| !matches!(trivia.kind, TriviaKind::Parentheses));
|
|
let after = expr.trivia.len();
|
|
if before != after {
|
|
expr.parentheses = Parenthesize::Always;
|
|
}
|
|
|
|
match &mut expr.node {
|
|
ExprKind::BoolOp { .. } => {}
|
|
ExprKind::NamedExpr { .. } => {}
|
|
ExprKind::BinOp { .. } => {}
|
|
ExprKind::UnaryOp { .. } => {}
|
|
ExprKind::Lambda { .. } => {}
|
|
ExprKind::IfExp { .. } => {}
|
|
ExprKind::Dict { .. } => {}
|
|
ExprKind::Set { .. } => {}
|
|
ExprKind::ListComp { .. } => {}
|
|
ExprKind::SetComp { .. } => {}
|
|
ExprKind::DictComp { .. } => {}
|
|
ExprKind::GeneratorExp { .. } => {}
|
|
ExprKind::Await { .. } => {}
|
|
ExprKind::Yield { .. } => {}
|
|
ExprKind::YieldFrom { .. } => {}
|
|
ExprKind::Compare { .. } => {}
|
|
ExprKind::Call { .. } => {}
|
|
ExprKind::FormattedValue { .. } => {}
|
|
ExprKind::JoinedStr { .. } => {}
|
|
ExprKind::Constant { .. } => {}
|
|
ExprKind::Attribute { .. } => {}
|
|
ExprKind::Subscript { value, slice, .. } => {
|
|
// If the slice isn't manually parenthesized, ensure that we _never_ parenthesize
|
|
// the value.
|
|
if !slice
|
|
.trivia
|
|
.iter()
|
|
.any(|trivia| matches!(trivia.kind, TriviaKind::Parentheses))
|
|
{
|
|
value.parentheses = Parenthesize::Never;
|
|
}
|
|
}
|
|
ExprKind::Starred { .. } => {}
|
|
ExprKind::Name { .. } => {}
|
|
ExprKind::List { .. } => {}
|
|
ExprKind::Tuple { .. } => {}
|
|
ExprKind::Slice { .. } => {}
|
|
}
|
|
|
|
visitor::walk_expr(self, expr);
|
|
}
|
|
}
|
|
|
|
/// Normalize parentheses in a Python CST.
|
|
///
|
|
/// It's not always possible to determine the correct parentheses to use during formatting
|
|
/// from the node (and trivia) alone; sometimes, we need to know the parent node. This
|
|
/// visitor normalizes parentheses via a top-down traversal, which simplifies the formatting
|
|
/// code later on.
|
|
///
|
|
/// TODO(charlie): It's weird that we have both `TriviaKind::Parentheses` (which aren't used
|
|
/// during formatting) and `Parenthesize` (which are used during formatting).
|
|
pub fn normalize_parentheses(python_cst: &mut [Stmt]) {
|
|
let mut normalizer = ParenthesesNormalizer {};
|
|
normalizer.visit_body(python_cst);
|
|
}
|