ruff/crates/ruff_python_formatter/src/parentheses.rs
Charlie Marsh ca49b00e55
Add initial formatter implementation (#2883)
# Summary

This PR contains the code for the autoformatter proof-of-concept.

## Crate structure

The primary formatting hook is the `fmt` function in `crates/ruff_python_formatter/src/lib.rs`.

The current formatter approach is outlined in `crates/ruff_python_formatter/src/lib.rs`, and is structured as follows:

- Tokenize the code using the RustPython lexer.
- In `crates/ruff_python_formatter/src/trivia.rs`, extract a variety of trivia tokens from the token stream. These include comments, trailing commas, and empty lines.
- Generate the AST via the RustPython parser.
- In `crates/ruff_python_formatter/src/cst.rs`, convert the AST to a CST structure. As of now, the CST is nearly identical to the AST, except that every node gets a `trivia` vector. But we might want to modify it further.
- In `crates/ruff_python_formatter/src/attachment.rs`, attach each trivia token to the corresponding CST node. The logic for this is mostly in `decorate_trivia` and is ported almost directly from Prettier (given each token, find its preceding, following, and enclosing nodes, then attach the token to the appropriate node in a second pass).
- In `crates/ruff_python_formatter/src/newlines.rs`, normalize newlines to match Black’s preferences. This involves traversing the CST and inserting or removing `TriviaToken` values as we go.
- Call `format!` on the CST, which delegates to type-specific formatter implementations (e.g., `crates/ruff_python_formatter/src/format/stmt.rs` for `Stmt` nodes, and similar for `Expr` nodes; the others are trivial). Those type-specific implementations delegate to kind-specific functions (e.g., `format_func_def`).

## Testing and iteration

The formatter is being developed against the Black test suite, which was copied over in-full to `crates/ruff_python_formatter/resources/test/fixtures/black`.

The Black fixtures had to be modified to create `[insta](https://github.com/mitsuhiko/insta)`-compatible snapshots, which now exist in the repo.

My approach thus far has been to try and improve coverage by tackling fixtures one-by-one.

## What works, and what doesn’t

- *Most* nodes are supported at a basic level (though there are a few stragglers at time of writing, like `StmtKind::Try`).
- Newlines are properly preserved in most cases.
- Magic trailing commas are properly preserved in some (but not all) cases.
- Trivial leading and trailing standalone comments mostly work (although maybe not at the end of a file).
- Inline comments, and comments within expressions, often don’t work -- they work in a few cases, but it’s one-off right now. (We’re probably associating them with the “right” nodes more often than we are actually rendering them in the right place.)
- We don’t properly normalize string quotes. (At present, we just repeat any constants verbatim.)
- We’re mishandling a bunch of wrapping cases (if we treat Black as the reference implementation). Here are a few examples (demonstrating Black's stable behavior):

```py
# In some cases, if the end expression is "self-closing" (functions,
# lists, dictionaries, sets, subscript accesses, and any length-two
# boolean operations that end in these elments), Black
# will wrap like this...
if some_expression and f(
    b,
    c,
    d,
):
    pass

# ...whereas we do this:
if (
    some_expression
    and f(
        b,
        c,
        d,
    )
):
    pass

# If function arguments can fit on a single line, then Black will
# format them like this, rather than exploding them vertically.
if f(
    a, b, c, d, e, f, g, ...
):
    pass
```

- We don’t properly preserve parentheses in all cases. Black preserves parentheses in some but not all cases.
2023-02-15 04:06:35 +00:00

169 lines
6.5 KiB
Rust

use crate::core::visitor;
use crate::core::visitor::Visitor;
use crate::cst::{Expr, ExprKind, Stmt, StmtKind};
use crate::trivia::{Parenthesize, TriviaKind};
/// Modify an [`Expr`] to infer parentheses, rather than respecting any user-provided trivia.
fn use_inferred_parens(expr: &mut Expr) {
// Remove parentheses, unless it's a generator expression, in which case, keep them.
if !matches!(expr.node, ExprKind::GeneratorExp { .. }) {
expr.trivia
.retain(|trivia| !matches!(trivia.kind, TriviaKind::Parentheses));
}
// If it's a tuple, add parentheses if it's a singleton; otherwise, we only need parentheses
// if the tuple expands.
if let ExprKind::Tuple { elts, .. } = &expr.node {
expr.parentheses = if elts.len() > 1 {
Parenthesize::IfExpanded
} else {
Parenthesize::Always
};
}
}
struct ParenthesesNormalizer {}
impl<'a> Visitor<'a> for ParenthesesNormalizer {
fn visit_stmt(&mut self, stmt: &'a mut Stmt) {
// Always remove parentheses around statements, unless it's an expression statement,
// in which case, remove parentheses around the expression.
let before = stmt.trivia.len();
stmt.trivia
.retain(|trivia| !matches!(trivia.kind, TriviaKind::Parentheses));
let after = stmt.trivia.len();
if let StmtKind::Expr { value } = &mut stmt.node {
if before != after {
stmt.parentheses = Parenthesize::Always;
value.parentheses = Parenthesize::Never;
}
}
// In a variety of contexts, remove parentheses around sub-expressions. Right now, the
// pattern is consistent (and repeated), but it may not end up that way.
// https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#parentheses
match &mut stmt.node {
StmtKind::FunctionDef { .. } => {}
StmtKind::AsyncFunctionDef { .. } => {}
StmtKind::ClassDef { .. } => {}
StmtKind::Return { value } => {
if let Some(value) = value {
use_inferred_parens(value);
}
}
StmtKind::Delete { .. } => {}
StmtKind::Assign { targets, value, .. } => {
for target in targets {
use_inferred_parens(target);
}
use_inferred_parens(value);
}
StmtKind::AugAssign { value, .. } => {
use_inferred_parens(value);
}
StmtKind::AnnAssign { value, .. } => {
if let Some(value) = value {
use_inferred_parens(value);
}
}
StmtKind::For { target, iter, .. } | StmtKind::AsyncFor { target, iter, .. } => {
use_inferred_parens(target);
use_inferred_parens(iter);
}
StmtKind::While { test, .. } => {
use_inferred_parens(test);
}
StmtKind::If { test, .. } => {
use_inferred_parens(test);
}
StmtKind::With { .. } => {}
StmtKind::AsyncWith { .. } => {}
StmtKind::Match { .. } => {}
StmtKind::Raise { .. } => {}
StmtKind::Try { .. } => {}
StmtKind::Assert { test, msg } => {
use_inferred_parens(test);
if let Some(msg) = msg {
use_inferred_parens(msg);
}
}
StmtKind::Import { .. } => {}
StmtKind::ImportFrom { .. } => {}
StmtKind::Global { .. } => {}
StmtKind::Nonlocal { .. } => {}
StmtKind::Expr { .. } => {}
StmtKind::Pass => {}
StmtKind::Break => {}
StmtKind::Continue => {}
}
visitor::walk_stmt(self, stmt);
}
fn visit_expr(&mut self, expr: &'a mut Expr) {
// Always retain parentheses around expressions.
let before = expr.trivia.len();
expr.trivia
.retain(|trivia| !matches!(trivia.kind, TriviaKind::Parentheses));
let after = expr.trivia.len();
if before != after {
expr.parentheses = Parenthesize::Always;
}
match &mut expr.node {
ExprKind::BoolOp { .. } => {}
ExprKind::NamedExpr { .. } => {}
ExprKind::BinOp { .. } => {}
ExprKind::UnaryOp { .. } => {}
ExprKind::Lambda { .. } => {}
ExprKind::IfExp { .. } => {}
ExprKind::Dict { .. } => {}
ExprKind::Set { .. } => {}
ExprKind::ListComp { .. } => {}
ExprKind::SetComp { .. } => {}
ExprKind::DictComp { .. } => {}
ExprKind::GeneratorExp { .. } => {}
ExprKind::Await { .. } => {}
ExprKind::Yield { .. } => {}
ExprKind::YieldFrom { .. } => {}
ExprKind::Compare { .. } => {}
ExprKind::Call { .. } => {}
ExprKind::FormattedValue { .. } => {}
ExprKind::JoinedStr { .. } => {}
ExprKind::Constant { .. } => {}
ExprKind::Attribute { .. } => {}
ExprKind::Subscript { value, slice, .. } => {
// If the slice isn't manually parenthesized, ensure that we _never_ parenthesize
// the value.
if !slice
.trivia
.iter()
.any(|trivia| matches!(trivia.kind, TriviaKind::Parentheses))
{
value.parentheses = Parenthesize::Never;
}
}
ExprKind::Starred { .. } => {}
ExprKind::Name { .. } => {}
ExprKind::List { .. } => {}
ExprKind::Tuple { .. } => {}
ExprKind::Slice { .. } => {}
}
visitor::walk_expr(self, expr);
}
}
/// Normalize parentheses in a Python CST.
///
/// It's not always possible to determine the correct parentheses to use during formatting
/// from the node (and trivia) alone; sometimes, we need to know the parent node. This
/// visitor normalizes parentheses via a top-down traversal, which simplifies the formatting
/// code later on.
///
/// TODO(charlie): It's weird that we have both `TriviaKind::Parentheses` (which aren't used
/// during formatting) and `Parenthesize` (which are used during formatting).
pub fn normalize_parentheses(python_cst: &mut [Stmt]) {
let mut normalizer = ParenthesesNormalizer {};
normalizer.visit_body(python_cst);
}