## Summary
`ruff_newlines` becomes `ruff_python_whitespace`, and includes the
existing "universal newline" handlers alongside the Python
whitespace-specific utilities.
In hindsight, `ruff_python` is too general. A good giveaway is that it's actually a prefix of some other crates. The intent of this crate is to reimplement pieces of the Python standard library and CPython itself, so `ruff_python_stdlib` feels appropriate.
This PR enables us to apply the proper quotation marks, including support for escapes. There are some significant TODOs, especially around implicit concatenations like:
```py
(
"abc"
"def"
)
```
Which are represented as a single AST node, which requires us to tokenize _within_ the formatter to identify all the individual string parts.
This PR changes the testing infrastructure to run all black tests and:
* Pass if Ruff and Black generate the same formatting
* Fail and write a markdown snapshot that shows the input code, the differences between Black and Ruff, Ruffs output, and Blacks output
This is achieved by introducing a new `fixture` macro (open to better name suggestions) that "duplicates" the attributed test for every file that matches the specified glob pattern. Creating a new test for each file over having a test that iterates over all files has the advantage that you can run a single test, and that test failures indicate which case is failing.
The `fixture` macro also makes it straightforward to e.g. setup our own spec tests that test very specific formatting by creating a new folder and use insta to assert the formatted output.
# Summary
This PR contains the code for the autoformatter proof-of-concept.
## Crate structure
The primary formatting hook is the `fmt` function in `crates/ruff_python_formatter/src/lib.rs`.
The current formatter approach is outlined in `crates/ruff_python_formatter/src/lib.rs`, and is structured as follows:
- Tokenize the code using the RustPython lexer.
- In `crates/ruff_python_formatter/src/trivia.rs`, extract a variety of trivia tokens from the token stream. These include comments, trailing commas, and empty lines.
- Generate the AST via the RustPython parser.
- In `crates/ruff_python_formatter/src/cst.rs`, convert the AST to a CST structure. As of now, the CST is nearly identical to the AST, except that every node gets a `trivia` vector. But we might want to modify it further.
- In `crates/ruff_python_formatter/src/attachment.rs`, attach each trivia token to the corresponding CST node. The logic for this is mostly in `decorate_trivia` and is ported almost directly from Prettier (given each token, find its preceding, following, and enclosing nodes, then attach the token to the appropriate node in a second pass).
- In `crates/ruff_python_formatter/src/newlines.rs`, normalize newlines to match Black’s preferences. This involves traversing the CST and inserting or removing `TriviaToken` values as we go.
- Call `format!` on the CST, which delegates to type-specific formatter implementations (e.g., `crates/ruff_python_formatter/src/format/stmt.rs` for `Stmt` nodes, and similar for `Expr` nodes; the others are trivial). Those type-specific implementations delegate to kind-specific functions (e.g., `format_func_def`).
## Testing and iteration
The formatter is being developed against the Black test suite, which was copied over in-full to `crates/ruff_python_formatter/resources/test/fixtures/black`.
The Black fixtures had to be modified to create `[insta](https://github.com/mitsuhiko/insta)`-compatible snapshots, which now exist in the repo.
My approach thus far has been to try and improve coverage by tackling fixtures one-by-one.
## What works, and what doesn’t
- *Most* nodes are supported at a basic level (though there are a few stragglers at time of writing, like `StmtKind::Try`).
- Newlines are properly preserved in most cases.
- Magic trailing commas are properly preserved in some (but not all) cases.
- Trivial leading and trailing standalone comments mostly work (although maybe not at the end of a file).
- Inline comments, and comments within expressions, often don’t work -- they work in a few cases, but it’s one-off right now. (We’re probably associating them with the “right” nodes more often than we are actually rendering them in the right place.)
- We don’t properly normalize string quotes. (At present, we just repeat any constants verbatim.)
- We’re mishandling a bunch of wrapping cases (if we treat Black as the reference implementation). Here are a few examples (demonstrating Black's stable behavior):
```py
# In some cases, if the end expression is "self-closing" (functions,
# lists, dictionaries, sets, subscript accesses, and any length-two
# boolean operations that end in these elments), Black
# will wrap like this...
if some_expression and f(
b,
c,
d,
):
pass
# ...whereas we do this:
if (
some_expression
and f(
b,
c,
d,
)
):
pass
# If function arguments can fit on a single line, then Black will
# format them like this, rather than exploding them vertically.
if f(
a, b, c, d, e, f, g, ...
):
pass
```
- We don’t properly preserve parentheses in all cases. Black preserves parentheses in some but not all cases.