Document shrinking script (#5942)

**Summary** Document shrinking script: I thinks it's both in a good
enough state and valuable enough to document it's usage.
This commit is contained in:
konsti 2023-07-21 11:32:26 +02:00 committed by GitHub
parent b56e8ad696
commit f6b40a021f
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -267,15 +267,6 @@ git clone --branch 3.10 https://github.com/python/cpython.git crates/ruff/resour
cargo run --bin ruff_dev -- format-dev --stability-check crates/ruff/resources/test/cpython
```
It is also possible large number of repositories using ruff. This dataset is large (~60GB), so we
only do this occasionally:
```shell
curl https://raw.githubusercontent.com/akx/ruff-usage-aggregate/master/data/known-github-tomls-clean.jsonl> github_search.jsonl
python scripts/check_ecosystem.py --checkouts target/checkouts --projects github_search.jsonl -v $(which true) $(which true)
cargo run --bin ruff_dev -- format-dev --stability-check --multi-project target/checkouts
```
Compared to `ruff check`, `cargo run --bin ruff_dev -- format-dev` has 4 additional options:
- `--write`: Format the files and write them back to disk
@ -284,6 +275,33 @@ Compared to `ruff check`, `cargo run --bin ruff_dev -- format-dev` has 4 additio
- `--error-file`: Use together with `--multi-project`, this writes all errors (but not status
messages) to a file.
It is also possible to check a large number of repositories. This dataset is large (~60GB), so we
only do this occasionally:
```shell
# Get the list of projects
curl https://raw.githubusercontent.com/akx/ruff-usage-aggregate/master/data/known-github-tomls-clean.jsonl > github_search.jsonl
# Repurpose this script to download the repositories for us
python scripts/check_ecosystem.py --checkouts target/checkouts --projects github_search.jsonl -v $(which true) $(which true)
# Check each project for formatter stability
cargo run --bin ruff_dev -- format-dev --stability-check --error-file target/formatter-ecosystem-errors.txt --multi-project target/checkouts
```
To shrink a formatter error from an entire file to a minimal reproducible example, you can use
`ruff_shrinking`:
```shell
cargo run --bin ruff_shrinking -- <your_file> target/shrinking.py "Unstable formatting" "target/release/ruff_dev format-dev --stability-check target/shrinking.py"
```
The first argument is the input file, the second is the output file where the candidates
and the eventual minimized version will be written to. The third argument is a regex matching the
error message, e.g. "Unstable formatting" or "Formatter error". The last argument is the command
with the error, e.g. running the stability check on the candidate file. The script will try various
strategies to remove parts of the code. If the output of the command still matches, it will use that
slightly smaller code as starting point for the next iteration, otherwise it will revert and try
a different strategy until all strategies are exhausted.
## The orphan rules and trait structure
For the formatter, we would like to implement `Format` from the rust_formatter crate for all AST