Commit graph

27 commits

Author SHA1 Message Date
Charlie Marsh
5f3da9955a
Rename ruff_python_whitespace to ruff_python_trivia (#5886)
## Summary

This crate now contains utilities for dealing with trivia more broadly:
whitespace, newlines, "simple" trivia lexing, etc. So renaming it to
reflect its increased responsibilities.

To avoid conflicts, I've also renamed `Token` and `TokenKind` to
`SimpleToken` and `SimpleTokenKind`.
2023-07-19 11:48:27 -04:00
Charlie Marsh
4204fc002d
Remove exception-handler lexing from unused-bound-exception fix (#5851)
## Summary

The motivation here is that it will make this rule easier to rewrite as
a deferred check. Right now, we can't run this rule in the deferred
phase, because it depends on the `except_handler` to power its autofix.
Instead of lexing the `except_handler`, we can use the `SimpleTokenizer`
from the formatter, and just lex forwards and backwards.

For context, this rule detects the unused `e` in:

```python
try:
  pass
except ValueError as e:
  pass
```
2023-07-18 18:27:46 +00:00
Charlie Marsh
6dbc6d2e59
Use shared Cursor across crates (#5715)
## Summary

We have two `Cursor` implementations. This PR moves the implementation
from the formatter into `ruff_python_whitespace` (kind of a poorly-named
crate now) and uses it for both use-cases.
2023-07-12 21:09:27 +00:00
David Szotten
1e894f328c
formatter: multi char tokens in SimpleTokenizer (#5610) 2023-07-10 09:00:59 +01:00
konsti
b22e6c3d38
Extend ruff_dev formatter script to compute statistics and format a project (#5492)
## Summary

This extends the `ruff_dev` formatter script util. Instead of only doing
stability checks, you can now choose different compatible options on the
CLI and get statistics.

* It adds an option the formats all files that ruff would check to allow
looking at an entire black-formatted repository with `git diff`
* It computes the [Jaccard
index](https://en.wikipedia.org/wiki/Jaccard_index) as a measure of
deviation between input and output, which is useful as single number
metric for assessing our current deviations from black.
* It adds progress bars to both the single projects as well as the
multi-project mode.
* It adds an option to write the multi-project output to a file

Sample usage:

```
$ cargo run --bin ruff_dev -- format-dev --stability-check crates/ruff/resources/test/cpython
$ cargo run --bin ruff_dev -- format-dev --stability-check /home/konsti/projects/django
Syntax error in /home/konsti/projects/django/tests/test_runner_apps/tagged/tests_syntax_error.py: source contains syntax errors (parser error): BaseError { error: UnrecognizedToken(Name { name: "syntax_error" }, None), offset: 131, source_path: "<filename>" }
Found 0 stability errors in 2755 files (jaccard index 0.911) in 9.75s
$ cargo run --bin ruff_dev -- format-dev --write /home/konsti/projects/django
```

Options:

```
Several utils related to the formatter which can be run on one or more repositories. The selected set of files in a repository is the same as for `ruff check`.

* Check formatter stability: Format a repository twice and ensure that it looks that the first and second formatting look the same. * Format: Format the files in a repository to be able to check them with `git diff` * Statistics: The subcommand the Jaccard index between the (assumed to be black formatted) input and the ruff formatted output

Usage: ruff_dev format-dev [OPTIONS] [FILES]...

Arguments:
  [FILES]...
          Like `ruff check`'s files. See `--multi-project` if you want to format an ecosystem checkout

Options:
      --stability-check
          Check stability
          
          We want to ensure that once formatted content stays the same when formatted again, which is known as formatter stability or formatter idempotency, and that the formatter prints syntactically valid code. As our test cases cover only a limited amount of code, this allows checking entire repositories.

      --write
          Format the files. Without this flag, the python files are not modified

      --format <FORMAT>
          Control the verbosity of the output
          
          [default: default]

          Possible values:
          - minimal: Filenames only
          - default: Filenames and reduced diff
          - full:    Full diff and invalid code

  -x, --exit-first-error
          Print only the first error and exit, `-x` is same as pytest

      --multi-project
          Checks each project inside a directory, useful e.g. if you want to check all of the ecosystem checkouts

      --error-file <ERROR_FILE>
          Write all errors to this file in addition to stdout. Only used in multi-project mode
```

## Test Plan

I ran this on django (2755 files, jaccard index 0.911) and discovered a
magic trailing comma problem and that we really needed to implement
import formatting. I ran the script on cpython to identify
https://github.com/astral-sh/ruff/pull/5558.
2023-07-07 11:30:12 +00:00
Micha Reiser
955e9ef821
Fix invalid syntax for binary expression in unary op (#5370) 2023-06-29 08:09:26 +02:00
Micha Reiser
f18a1f70de
Add tests for skip magic trailing comma
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary

This PR adds tests that verify that the magic trailing comma is not respected if disabled in the formatter options. 

Our test setup now allows to create a `<fixture-name>.options.json` file that contains an array of configurations that should be tested. 

<!-- What's the purpose of the change? What does it do, and why? -->

## Test Plan

It's all about tests :) 

<!-- How was it tested? -->
2023-06-26 14:15:55 +02:00
Micha Reiser
8879927b9a
Use insta::glob instead of fixture macro (#5364) 2023-06-26 08:46:18 +00:00
Micha Reiser
c52aa8f065
Basic string formatting
<!--
Thank you for contributing to Ruff! To help us out with reviewing, please consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary

This PR implements formatting for non-f-string Strings that do not use implicit concatenation. 

Docstring formatting is out of the scope of this PR.

<!-- What's the purpose of the change? What does it do, and why? -->

## Test Plan

I added a few tests for simple string literals. 

## Performance

Ouch. This is hitting performance somewhat hard. This is probably because we now iterate each string a couple of times:

1. To detect if it is an implicit string continuation
2. To detect if the string contains any new lines
3. To detect the preferred quote
4. To normalize the string

Edit: I integrated the detection of newlines into the preferred quote detection so that we only iterate the string three time.
We can probably do better by merging the implicit string continuation with the quote detection and new line detection by iterating till the end of the string part and returning the offset. We then use our simple tokenizer to skip over any comments or whitespace until we find the first non trivia token. From there we keep continue doing this in a loop until we reach the end o the string. I'll leave this improvement for later.
2023-06-23 09:46:05 +02:00
Charlie Marsh
68b6d30c46
Use consistent Cargo.toml metadata in all crates (#5015) 2023-06-12 00:02:40 +00:00
Charlie Marsh
1d756dc3a7
Move Python whitespace utilities into new ruff_python_whitespace crate (#4993)
## Summary

`ruff_newlines` becomes `ruff_python_whitespace`, and includes the
existing "universal newline" handlers alongside the Python
whitespace-specific utilities.
2023-06-10 00:59:57 +00:00
Charlie Marsh
9d0ffd33ca
Move universal newline handling into its own crate (#4729) 2023-05-31 12:00:47 -04:00
Micha Reiser
0cd453bdf0
Generic "comment to node" association logic (#4642) 2023-05-30 09:28:01 +00:00
Micha Reiser
6146b75dd0
Add MultiMap implementation for storing comments (#4639) 2023-05-30 09:51:25 +02:00
Jeong, YunWon
be6e00ef6e
Re-integrate RustPython parser repository (#4359)
Co-authored-by: Micha Reiser <micha@reiser.io>
2023-05-11 07:47:17 +00:00
Micha Reiser
cab65b25da
Replace row/column based Location with byte-offsets. (#3931) 2023-04-26 18:11:02 +00:00
Charlie Marsh
d919adc13c
Introduce a ruff_python_semantic crate (#3865) 2023-04-04 16:50:47 +00:00
Charlie Marsh
ff2c0dd491
Use shared leading_quote implementation in ruff_python_formatter (#3396) 2023-03-08 18:21:59 +00:00
Charlie Marsh
d1c48016eb
Rename ruff_python crate to ruff_python_stdlib (#3354)
In hindsight, `ruff_python` is too general. A good giveaway is that it's actually a prefix of some other crates. The intent of this crate is to reimplement pieces of the Python standard library and CPython itself, so `ruff_python_stdlib` feels appropriate.
2023-03-06 13:43:22 +00:00
Jonathan Plasse
8828e12283
Bump dependencies and move more shared dependencies into workspace (#3340) 2023-03-04 12:36:26 -05:00
Charlie Marsh
061495a9eb
Make BoolOp its own located token (#3265) 2023-02-28 03:43:28 +00:00
Jeong YunWon
84e96cdcd9
More enum work (#3212) 2023-02-25 11:40:16 -05:00
Charlie Marsh
f967f344fc
Add support for basic Constant::Str formatting (#3173)
This PR enables us to apply the proper quotation marks, including support for escapes. There are some significant TODOs, especially around implicit concatenations like:

```py
(
  "abc"
  "def"
)
```

Which are represented as a single AST node, which requires us to tokenize _within_ the formatter to identify all the individual string parts.
2023-02-23 16:23:10 +00:00
Charlie Marsh
095f005bf4
Move RustPython vendored and helper code into its own crate (#3171) 2023-02-23 14:14:16 +00:00
Micha Reiser
ed33b75bad
test(ruff_python_formatter): Run all Black tests (#2993)
This PR changes the testing infrastructure to run all black tests and:

* Pass if Ruff and Black generate the same formatting
* Fail and write a markdown snapshot that shows the input code, the differences between Black and Ruff, Ruffs output, and Blacks output

This is achieved by introducing a new `fixture` macro (open to better name suggestions) that "duplicates" the attributed test for every file that matches the specified glob pattern. Creating a new test for each file over having a test that iterates over all files has the advantage that you can run a single test, and that test failures indicate which case is failing. 

The `fixture` macro also makes it straightforward to e.g. setup our own spec tests that test very specific formatting by creating a new folder and use insta to assert the formatted output.
2023-02-22 09:25:06 -05:00
Jonathan Plasse
b75663be6d
Add missing rust-version in crates (#3009) 2023-02-19 15:07:17 +00:00
Charlie Marsh
ca49b00e55
Add initial formatter implementation (#2883)
# Summary

This PR contains the code for the autoformatter proof-of-concept.

## Crate structure

The primary formatting hook is the `fmt` function in `crates/ruff_python_formatter/src/lib.rs`.

The current formatter approach is outlined in `crates/ruff_python_formatter/src/lib.rs`, and is structured as follows:

- Tokenize the code using the RustPython lexer.
- In `crates/ruff_python_formatter/src/trivia.rs`, extract a variety of trivia tokens from the token stream. These include comments, trailing commas, and empty lines.
- Generate the AST via the RustPython parser.
- In `crates/ruff_python_formatter/src/cst.rs`, convert the AST to a CST structure. As of now, the CST is nearly identical to the AST, except that every node gets a `trivia` vector. But we might want to modify it further.
- In `crates/ruff_python_formatter/src/attachment.rs`, attach each trivia token to the corresponding CST node. The logic for this is mostly in `decorate_trivia` and is ported almost directly from Prettier (given each token, find its preceding, following, and enclosing nodes, then attach the token to the appropriate node in a second pass).
- In `crates/ruff_python_formatter/src/newlines.rs`, normalize newlines to match Black’s preferences. This involves traversing the CST and inserting or removing `TriviaToken` values as we go.
- Call `format!` on the CST, which delegates to type-specific formatter implementations (e.g., `crates/ruff_python_formatter/src/format/stmt.rs` for `Stmt` nodes, and similar for `Expr` nodes; the others are trivial). Those type-specific implementations delegate to kind-specific functions (e.g., `format_func_def`).

## Testing and iteration

The formatter is being developed against the Black test suite, which was copied over in-full to `crates/ruff_python_formatter/resources/test/fixtures/black`.

The Black fixtures had to be modified to create `[insta](https://github.com/mitsuhiko/insta)`-compatible snapshots, which now exist in the repo.

My approach thus far has been to try and improve coverage by tackling fixtures one-by-one.

## What works, and what doesn’t

- *Most* nodes are supported at a basic level (though there are a few stragglers at time of writing, like `StmtKind::Try`).
- Newlines are properly preserved in most cases.
- Magic trailing commas are properly preserved in some (but not all) cases.
- Trivial leading and trailing standalone comments mostly work (although maybe not at the end of a file).
- Inline comments, and comments within expressions, often don’t work -- they work in a few cases, but it’s one-off right now. (We’re probably associating them with the “right” nodes more often than we are actually rendering them in the right place.)
- We don’t properly normalize string quotes. (At present, we just repeat any constants verbatim.)
- We’re mishandling a bunch of wrapping cases (if we treat Black as the reference implementation). Here are a few examples (demonstrating Black's stable behavior):

```py
# In some cases, if the end expression is "self-closing" (functions,
# lists, dictionaries, sets, subscript accesses, and any length-two
# boolean operations that end in these elments), Black
# will wrap like this...
if some_expression and f(
    b,
    c,
    d,
):
    pass

# ...whereas we do this:
if (
    some_expression
    and f(
        b,
        c,
        d,
    )
):
    pass

# If function arguments can fit on a single line, then Black will
# format them like this, rather than exploding them vertically.
if f(
    a, b, c, d, e, f, g, ...
):
    pass
```

- We don’t properly preserve parentheses in all cases. Black preserves parentheses in some but not all cases.
2023-02-15 04:06:35 +00:00