Add initial formatter implementation (#2883)

# Summary

This PR contains the code for the autoformatter proof-of-concept.

## Crate structure

The primary formatting hook is the `fmt` function in `crates/ruff_python_formatter/src/lib.rs`.

The current formatter approach is outlined in `crates/ruff_python_formatter/src/lib.rs`, and is structured as follows:

- Tokenize the code using the RustPython lexer.
- In `crates/ruff_python_formatter/src/trivia.rs`, extract a variety of trivia tokens from the token stream. These include comments, trailing commas, and empty lines.
- Generate the AST via the RustPython parser.
- In `crates/ruff_python_formatter/src/cst.rs`, convert the AST to a CST structure. As of now, the CST is nearly identical to the AST, except that every node gets a `trivia` vector. But we might want to modify it further.
- In `crates/ruff_python_formatter/src/attachment.rs`, attach each trivia token to the corresponding CST node. The logic for this is mostly in `decorate_trivia` and is ported almost directly from Prettier (given each token, find its preceding, following, and enclosing nodes, then attach the token to the appropriate node in a second pass).
- In `crates/ruff_python_formatter/src/newlines.rs`, normalize newlines to match Black’s preferences. This involves traversing the CST and inserting or removing `TriviaToken` values as we go.
- Call `format!` on the CST, which delegates to type-specific formatter implementations (e.g., `crates/ruff_python_formatter/src/format/stmt.rs` for `Stmt` nodes, and similar for `Expr` nodes; the others are trivial). Those type-specific implementations delegate to kind-specific functions (e.g., `format_func_def`).

## Testing and iteration

The formatter is being developed against the Black test suite, which was copied over in-full to `crates/ruff_python_formatter/resources/test/fixtures/black`.

The Black fixtures had to be modified to create `[insta](https://github.com/mitsuhiko/insta)`-compatible snapshots, which now exist in the repo.

My approach thus far has been to try and improve coverage by tackling fixtures one-by-one.

## What works, and what doesn’t

- *Most* nodes are supported at a basic level (though there are a few stragglers at time of writing, like `StmtKind::Try`).
- Newlines are properly preserved in most cases.
- Magic trailing commas are properly preserved in some (but not all) cases.
- Trivial leading and trailing standalone comments mostly work (although maybe not at the end of a file).
- Inline comments, and comments within expressions, often don’t work -- they work in a few cases, but it’s one-off right now. (We’re probably associating them with the “right” nodes more often than we are actually rendering them in the right place.)
- We don’t properly normalize string quotes. (At present, we just repeat any constants verbatim.)
- We’re mishandling a bunch of wrapping cases (if we treat Black as the reference implementation). Here are a few examples (demonstrating Black's stable behavior):

```py
# In some cases, if the end expression is "self-closing" (functions,
# lists, dictionaries, sets, subscript accesses, and any length-two
# boolean operations that end in these elments), Black
# will wrap like this...
if some_expression and f(
    b,
    c,
    d,
):
    pass

# ...whereas we do this:
if (
    some_expression
    and f(
        b,
        c,
        d,
    )
):
    pass

# If function arguments can fit on a single line, then Black will
# format them like this, rather than exploding them vertically.
if f(
    a, b, c, d, e, f, g, ...
):
    pass
```

- We don’t properly preserve parentheses in all cases. Black preserves parentheses in some but not all cases.
This commit is contained in:
Charlie Marsh 2023-02-14 23:06:35 -05:00 committed by GitHub
parent f661c90bd7
commit ca49b00e55
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
134 changed files with 12044 additions and 18 deletions

View file

@ -0,0 +1,829 @@
#![allow(unused_variables, clippy::too_many_arguments)]
use ruff_formatter::prelude::*;
use ruff_formatter::{format_args, write};
use ruff_text_size::TextSize;
use crate::builders::literal;
use crate::context::ASTFormatContext;
use crate::cst::{Alias, Arguments, Expr, ExprKind, Keyword, Stmt, StmtKind, Withitem};
use crate::format::builders::{block, join_names};
use crate::format::helpers::is_self_closing;
use crate::shared_traits::AsFormat;
use crate::trivia::{Parenthesize, Relationship, TriviaKind};
fn format_break(f: &mut Formatter<ASTFormatContext<'_>>) -> FormatResult<()> {
write!(f, [text("break")])
}
fn format_pass(f: &mut Formatter<ASTFormatContext<'_>>, stmt: &Stmt) -> FormatResult<()> {
// Write the statement body.
write!(f, [text("pass")])?;
// Apply any inline comments.
let mut first = true;
for range in stmt.trivia.iter().filter_map(|trivia| {
if matches!(trivia.relationship, Relationship::Trailing) {
if let TriviaKind::InlineComment(range) = trivia.kind {
Some(range)
} else {
None
}
} else {
None
}
}) {
if std::mem::take(&mut first) {
write!(f, [text(" ")])?;
}
write!(f, [literal(range)])?;
}
Ok(())
}
fn format_continue(f: &mut Formatter<ASTFormatContext<'_>>) -> FormatResult<()> {
write!(f, [text("continue")])
}
fn format_global(f: &mut Formatter<ASTFormatContext<'_>>, names: &[String]) -> FormatResult<()> {
write!(f, [text("global")])?;
if !names.is_empty() {
write!(f, [space(), join_names(names)])?;
}
Ok(())
}
fn format_nonlocal(f: &mut Formatter<ASTFormatContext<'_>>, names: &[String]) -> FormatResult<()> {
write!(f, [text("nonlocal")])?;
if !names.is_empty() {
write!(f, [space(), join_names(names)])?;
}
Ok(())
}
fn format_delete(f: &mut Formatter<ASTFormatContext<'_>>, targets: &[Expr]) -> FormatResult<()> {
write!(f, [text("del")])?;
match targets.len() {
0 => Ok(()),
1 => write!(f, [space(), targets[0].format()]),
_ => {
write!(
f,
[
space(),
group(&format_args![
if_group_breaks(&text("(")),
soft_block_indent(&format_with(|f| {
for (i, target) in targets.iter().enumerate() {
write!(f, [target.format()])?;
if i < targets.len() - 1 {
write!(f, [text(","), soft_line_break_or_space()])?;
} else {
write!(f, [if_group_breaks(&text(","))])?;
}
}
Ok(())
})),
if_group_breaks(&text(")")),
])
]
)
}
}
}
fn format_class_def(
f: &mut Formatter<ASTFormatContext<'_>>,
name: &str,
bases: &[Expr],
keywords: &[Keyword],
body: &[Stmt],
decorator_list: &[Expr],
) -> FormatResult<()> {
for decorator in decorator_list {
write!(f, [text("@"), decorator.format(), hard_line_break()])?;
}
write!(
f,
[
text("class"),
space(),
dynamic_text(name, TextSize::default())
]
)?;
if !bases.is_empty() || !keywords.is_empty() {
let format_bases = format_with(|f| {
for (i, expr) in bases.iter().enumerate() {
write!(f, [expr.format()])?;
if i < bases.len() - 1 || !keywords.is_empty() {
write!(f, [text(","), soft_line_break_or_space()])?;
} else {
write!(f, [if_group_breaks(&text(","))])?;
}
for (i, keyword) in keywords.iter().enumerate() {
if let Some(arg) = &keyword.node.arg {
write!(
f,
[
dynamic_text(arg, TextSize::default()),
text("="),
keyword.node.value.format()
]
)?;
} else {
write!(f, [text("**"), keyword.node.value.format()])?;
}
if i < keywords.len() - 1 {
write!(f, [text(","), soft_line_break_or_space()])?;
} else {
write!(f, [if_group_breaks(&text(","))])?;
}
}
}
Ok(())
});
write!(
f,
[
text("("),
group(&soft_block_indent(&format_bases)),
text(")")
]
)?;
}
write!(f, [text(":"), block_indent(&block(body))])
}
fn format_func_def(
f: &mut Formatter<ASTFormatContext<'_>>,
stmt: &Stmt,
name: &str,
args: &Arguments,
returns: Option<&Expr>,
body: &[Stmt],
decorator_list: &[Expr],
async_: bool,
) -> FormatResult<()> {
for decorator in decorator_list {
write!(f, [text("@"), decorator.format(), hard_line_break()])?;
}
if async_ {
write!(f, [text("async"), space()])?;
}
write!(
f,
[
text("def"),
space(),
dynamic_text(name, TextSize::default()),
text("("),
group(&soft_block_indent(&format_with(|f| {
if stmt
.trivia
.iter()
.any(|c| matches!(c.kind, TriviaKind::MagicTrailingComma))
{
write!(f, [expand_parent()])?;
}
write!(f, [args.format()])
}))),
text(")")
]
)?;
if let Some(returns) = returns {
write!(f, [text(" -> "), returns.format()])?;
}
write!(f, [text(":")])?;
// Apply any inline comments.
let mut first = true;
for range in stmt.trivia.iter().filter_map(|trivia| {
if matches!(trivia.relationship, Relationship::Trailing) {
if let TriviaKind::InlineComment(range) = trivia.kind {
Some(range)
} else {
None
}
} else {
None
}
}) {
if std::mem::take(&mut first) {
write!(f, [text(" ")])?;
}
write!(f, [literal(range)])?;
}
write!(f, [block_indent(&format_args![block(body)])])
}
fn format_assign(
f: &mut Formatter<ASTFormatContext<'_>>,
stmt: &Stmt,
targets: &[Expr],
value: &Expr,
) -> FormatResult<()> {
write!(f, [targets[0].format()])?;
for target in &targets[1..] {
// TODO(charlie): This doesn't match Black's behavior. We need to parenthesize
// this expression sometimes.
write!(f, [text(" = "), target.format()])?;
}
write!(f, [text(" = ")])?;
if is_self_closing(value) {
write!(f, [group(&value.format())])?;
} else {
write!(
f,
[group(&format_args![
if_group_breaks(&text("(")),
soft_block_indent(&value.format()),
if_group_breaks(&text(")")),
])]
)?;
}
// Apply any inline comments.
let mut first = true;
for range in stmt.trivia.iter().filter_map(|trivia| {
if matches!(trivia.relationship, Relationship::Trailing) {
if let TriviaKind::InlineComment(range) = trivia.kind {
Some(range)
} else {
None
}
} else {
None
}
}) {
if std::mem::take(&mut first) {
write!(f, [text(" ")])?;
}
write!(f, [literal(range)])?;
}
Ok(())
}
fn format_ann_assign(
f: &mut Formatter<ASTFormatContext<'_>>,
stmt: &Stmt,
target: &Expr,
annotation: &Expr,
value: Option<&Expr>,
simple: usize,
) -> FormatResult<()> {
let need_parens = matches!(target.node, ExprKind::Name { .. }) && simple == 0;
if need_parens {
write!(f, [text("(")])?;
}
write!(f, [target.format()])?;
if need_parens {
write!(f, [text(")")])?;
}
write!(f, [text(": "), annotation.format()])?;
if let Some(value) = value {
write!(
f,
[
space(),
text("="),
space(),
group(&format_args![
if_group_breaks(&text("(")),
soft_block_indent(&value.format()),
if_group_breaks(&text(")")),
])
]
)?;
}
Ok(())
}
fn format_for(
f: &mut Formatter<ASTFormatContext<'_>>,
stmt: &Stmt,
target: &Expr,
iter: &Expr,
body: &[Stmt],
_orelse: &[Stmt],
_type_comment: Option<&str>,
) -> FormatResult<()> {
write!(
f,
[
text("for"),
space(),
group(&target.format()),
space(),
text("in"),
space(),
group(&iter.format()),
text(":"),
block_indent(&block(body))
]
)
}
fn format_while(
f: &mut Formatter<ASTFormatContext<'_>>,
stmt: &Stmt,
test: &Expr,
body: &[Stmt],
orelse: &[Stmt],
) -> FormatResult<()> {
write!(f, [text("while"), space()])?;
if is_self_closing(test) {
write!(f, [test.format()])?;
} else {
write!(
f,
[group(&format_args![
if_group_breaks(&text("(")),
soft_block_indent(&test.format()),
if_group_breaks(&text(")")),
])]
)?;
}
write!(f, [text(":"), block_indent(&block(body))])?;
if !orelse.is_empty() {
write!(f, [text("else:"), block_indent(&block(orelse))])?;
}
Ok(())
}
fn format_if(
f: &mut Formatter<ASTFormatContext<'_>>,
test: &Expr,
body: &[Stmt],
orelse: &[Stmt],
) -> FormatResult<()> {
write!(f, [text("if"), space()])?;
if is_self_closing(test) {
write!(f, [test.format()])?;
} else {
write!(
f,
[group(&format_args![
if_group_breaks(&text("(")),
soft_block_indent(&test.format()),
if_group_breaks(&text(")")),
])]
)?;
}
write!(f, [text(":"), block_indent(&block(body))])?;
if !orelse.is_empty() {
if orelse.len() == 1 {
if let StmtKind::If { test, body, orelse } = &orelse[0].node {
write!(f, [text("el")])?;
format_if(f, test, body, orelse)?;
} else {
write!(f, [text("else:"), block_indent(&block(orelse))])?;
}
} else {
write!(f, [text("else:"), block_indent(&block(orelse))])?;
}
}
Ok(())
}
fn format_raise(
f: &mut Formatter<ASTFormatContext<'_>>,
stmt: &Stmt,
exc: Option<&Expr>,
cause: Option<&Expr>,
) -> FormatResult<()> {
write!(f, [text("raise")])?;
if let Some(exc) = exc {
write!(f, [space(), exc.format()])?;
if let Some(cause) = cause {
write!(f, [space(), text("from"), space(), cause.format()])?;
}
}
Ok(())
}
fn format_return(
f: &mut Formatter<ASTFormatContext<'_>>,
value: Option<&Expr>,
) -> FormatResult<()> {
write!(f, [text("return")])?;
if let Some(value) = value {
write!(f, [space(), value.format()])?;
}
Ok(())
}
fn format_assert(
f: &mut Formatter<ASTFormatContext<'_>>,
stmt: &Stmt,
test: &Expr,
msg: Option<&Expr>,
) -> FormatResult<()> {
write!(f, [text("assert"), space()])?;
write!(
f,
[group(&format_args![
if_group_breaks(&text("(")),
soft_block_indent(&test.format()),
if_group_breaks(&text(")")),
])]
)?;
if let Some(msg) = msg {
write!(
f,
[
text(","),
space(),
group(&format_args![
if_group_breaks(&text("(")),
soft_block_indent(&msg.format()),
if_group_breaks(&text(")")),
])
]
)?;
}
Ok(())
}
fn format_import(
f: &mut Formatter<ASTFormatContext<'_>>,
stmt: &Stmt,
names: &[Alias],
) -> FormatResult<()> {
write!(
f,
[
text("import"),
space(),
group(&format_args![
if_group_breaks(&text("(")),
soft_block_indent(&format_with(|f| {
for (i, name) in names.iter().enumerate() {
write!(f, [name.format()])?;
if i < names.len() - 1 {
write!(f, [text(","), soft_line_break_or_space()])?;
} else {
write!(f, [if_group_breaks(&text(","))])?;
}
}
Ok(())
})),
if_group_breaks(&text(")")),
])
]
)
}
fn format_import_from(
f: &mut Formatter<ASTFormatContext<'_>>,
stmt: &Stmt,
module: Option<&str>,
names: &[Alias],
level: Option<&usize>,
) -> FormatResult<()> {
write!(f, [text("from")])?;
write!(f, [space()])?;
if let Some(level) = level {
for _ in 0..*level {
write!(f, [text(".")])?;
}
}
if let Some(module) = module {
write!(f, [dynamic_text(module, TextSize::default())])?;
}
write!(f, [space()])?;
write!(f, [text("import")])?;
write!(f, [space()])?;
if names.iter().any(|name| name.node.name == "*") {
write!(f, [text("*")])?;
} else {
let magic_trailing_comma = stmt
.trivia
.iter()
.any(|c| matches!(c.kind, TriviaKind::MagicTrailingComma));
write!(
f,
[group(&format_args![
if_group_breaks(&text("(")),
soft_block_indent(&format_with(|f| {
if magic_trailing_comma {
write!(f, [expand_parent()])?;
}
for (i, name) in names.iter().enumerate() {
write!(f, [name.format()])?;
if i < names.len() - 1 {
write!(f, [text(",")])?;
write!(f, [soft_line_break_or_space()])?;
} else {
write!(f, [if_group_breaks(&text(","))])?;
}
}
Ok(())
})),
if_group_breaks(&text(")")),
])]
)?;
}
// Apply any inline comments.
let mut first = true;
for range in stmt.trivia.iter().filter_map(|trivia| {
if matches!(trivia.relationship, Relationship::Trailing) {
if let TriviaKind::InlineComment(range) = trivia.kind {
Some(range)
} else {
None
}
} else {
None
}
}) {
if std::mem::take(&mut first) {
write!(f, [text(" ")])?;
}
write!(f, [literal(range)])?;
}
Ok(())
}
fn format_expr(
f: &mut Formatter<ASTFormatContext<'_>>,
stmt: &Stmt,
expr: &Expr,
) -> FormatResult<()> {
if matches!(stmt.parentheses, Parenthesize::Always) {
write!(
f,
[group(&format_args![
text("("),
soft_block_indent(&format_args![expr.format()]),
text(")"),
])]
)?;
} else if is_self_closing(expr) {
write!(f, [group(&format_args![expr.format()])])?;
} else {
write!(
f,
[group(&format_args![
if_group_breaks(&text("(")),
soft_block_indent(&format_args![expr.format()]),
if_group_breaks(&text(")")),
])]
)?;
}
// Apply any inline comments.
let mut first = true;
for range in stmt.trivia.iter().filter_map(|trivia| {
if matches!(trivia.relationship, Relationship::Trailing) {
if let TriviaKind::InlineComment(range) = trivia.kind {
Some(range)
} else {
None
}
} else {
None
}
}) {
if std::mem::take(&mut first) {
write!(f, [text(" ")])?;
}
write!(f, [literal(range)])?;
}
Ok(())
}
fn format_with_(
f: &mut Formatter<ASTFormatContext<'_>>,
stmt: &Stmt,
items: &[Withitem],
body: &[Stmt],
type_comment: Option<&str>,
async_: bool,
) -> FormatResult<()> {
if async_ {
write!(f, [text("async"), space()])?;
}
write!(
f,
[
text("with"),
space(),
group(&format_args![
if_group_breaks(&text("(")),
soft_block_indent(&format_with(|f| {
for (i, item) in items.iter().enumerate() {
write!(f, [item.format()])?;
if i < items.len() - 1 {
write!(f, [text(","), soft_line_break_or_space()])?;
} else {
write!(f, [if_group_breaks(&text(","))])?;
}
}
Ok(())
})),
if_group_breaks(&text(")")),
]),
text(":"),
block_indent(&block(body))
]
)
}
pub struct FormatStmt<'a> {
item: &'a Stmt,
}
impl Format<ASTFormatContext<'_>> for FormatStmt<'_> {
fn fmt(&self, f: &mut Formatter<ASTFormatContext<'_>>) -> FormatResult<()> {
// Any leading comments come on the line before.
for trivia in &self.item.trivia {
if matches!(trivia.relationship, Relationship::Leading) {
match trivia.kind {
TriviaKind::EmptyLine => {
write!(f, [empty_line()])?;
}
TriviaKind::StandaloneComment(range) => {
write!(f, [literal(range), hard_line_break()])?;
}
_ => {}
}
}
}
match &self.item.node {
StmtKind::Pass => format_pass(f, self.item),
StmtKind::Break => format_break(f),
StmtKind::Continue => format_continue(f),
StmtKind::Global { names } => format_global(f, names),
StmtKind::Nonlocal { names } => format_nonlocal(f, names),
StmtKind::FunctionDef {
name,
args,
body,
decorator_list,
returns,
..
} => format_func_def(
f,
self.item,
name,
args,
returns.as_deref(),
body,
decorator_list,
false,
),
StmtKind::AsyncFunctionDef {
name,
args,
body,
decorator_list,
returns,
..
} => format_func_def(
f,
self.item,
name,
args,
returns.as_deref(),
body,
decorator_list,
true,
),
StmtKind::ClassDef {
name,
bases,
keywords,
body,
decorator_list,
} => format_class_def(f, name, bases, keywords, body, decorator_list),
StmtKind::Return { value } => format_return(f, value.as_ref()),
StmtKind::Delete { targets } => format_delete(f, targets),
StmtKind::Assign { targets, value, .. } => format_assign(f, self.item, targets, value),
// StmtKind::AugAssign { .. } => {}
StmtKind::AnnAssign {
target,
annotation,
value,
simple,
} => format_ann_assign(f, self.item, target, annotation, value.as_deref(), *simple),
StmtKind::For {
target,
iter,
body,
orelse,
type_comment,
} => format_for(
f,
self.item,
target,
iter,
body,
orelse,
type_comment.as_deref(),
),
// StmtKind::AsyncFor { .. } => {}
StmtKind::While { test, body, orelse } => {
format_while(f, self.item, test, body, orelse)
}
StmtKind::If { test, body, orelse } => format_if(f, test, body, orelse),
StmtKind::With {
items,
body,
type_comment,
} => format_with_(
f,
self.item,
items,
body,
type_comment.as_ref().map(String::as_str),
false,
),
StmtKind::AsyncWith {
items,
body,
type_comment,
} => format_with_(
f,
self.item,
items,
body,
type_comment.as_ref().map(String::as_str),
true,
),
// StmtKind::Match { .. } => {}
StmtKind::Raise { exc, cause } => {
format_raise(f, self.item, exc.as_deref(), cause.as_deref())
}
// StmtKind::Try { .. } => {}
StmtKind::Assert { test, msg } => {
format_assert(f, self.item, test, msg.as_ref().map(|expr| &**expr))
}
StmtKind::Import { names } => format_import(f, self.item, names),
StmtKind::ImportFrom {
module,
names,
level,
} => format_import_from(
f,
self.item,
module.as_ref().map(String::as_str),
names,
level.as_ref(),
),
// StmtKind::Nonlocal { .. } => {}
StmtKind::Expr { value } => format_expr(f, self.item, value),
_ => {
unimplemented!("Implement StmtKind: {:?}", self.item.node)
}
}?;
// Any trailing comments come on the lines after.
for trivia in &self.item.trivia {
if matches!(trivia.relationship, Relationship::Trailing) {
match trivia.kind {
TriviaKind::EmptyLine => {
write!(f, [empty_line()])?;
}
TriviaKind::StandaloneComment(range) => {
write!(f, [literal(range), hard_line_break()])?;
}
_ => {}
}
}
}
Ok(())
}
}
impl AsFormat<ASTFormatContext<'_>> for Stmt {
type Format<'a> = FormatStmt<'a>;
fn format(&self) -> Self::Format<'_> {
FormatStmt { item: self }
}
}