Commit graph

27 commits

Author SHA1 Message Date
Dhruv Manilawala
d8f5d2d767
Add support for auto-fix in Jupyter notebooks (#4665)
## Summary

Add support for applying auto-fixes in Jupyter Notebook.

### Solution

Cell offsets are the boundaries for each cell in the concatenated source
code. They are represented using `TextSize`. It includes the start and
end offset as well, thus creating a range for each cell. These offsets
are updated using the `SourceMap` markers.

### SourceMap

`SourceMap` contains markers constructed from each edits which tracks
the original source code position to the transformed positions. The
following drawing might make it clear:

![SourceMap visualization](3c94e591-70a7-4b57-bd32-0baa91cc7858)

The center column where the dotted lines are present are the markers
included in the `SourceMap`. The `Notebook` looks at these markers and
updates the cell offsets after each linter loop. If you notice closely,
the destination takes into account all of the markers before it.

The index is constructed only when required as it's only used to render
the diagnostics. So, a `OnceCell` is used for this purpose. The cell
offsets, cell content and the index will be updated after each iteration
of linting in the mentioned order. The order is important here as the
content is updated as per the new offsets and index is updated as per
the new content.

## Limitations

### 1

Styling rules such as the ones in `pycodestyle` will not be applicable
everywhere in Jupyter notebook, especially at the cell boundaries. Let's
take an example where a rule suggests to have 2 blank lines before a
function and the cells contains the following code:

```python
import something
# ---
def first():
	pass

def second():
	pass
```

(Again, the comment is only to visualize cell boundaries.)

In the concatenated source code, the 2 blank lines will be added but it
shouldn't actually be added when we look in terms of Jupyter notebook.
It's as if the function `first` is at the start of a file.

`nbqa` solves this by recording newlines before and after running
`autopep8`, then running the tool and restoring the newlines at the end
(refer https://github.com/nbQA-dev/nbQA/pull/807).

## Test Plan

Three commands were run in order with common flags (`--select=ALL
--no-cache --isolated`) to isolate which stage the problem is occurring:
1. Only diagnostics
2. Fix with diff (`--fix --diff`)
3. Fix (`--fix`)

### https://github.com/facebookresearch/segment-anything

```
-------------------------------------------------------------------------------
 Jupyter Notebooks       3            0            0            0            0
 |- Markdown             3           98            0           94            4
 |- Python               3          513          468            4           41
 (Total)                            611          468           98           45
-------------------------------------------------------------------------------
```

```console
$ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/segment-anything/**/*.ipynb --fix
...
Found 180 errors (89 fixed, 91 remaining).
```

### https://github.com/openai/openai-cookbook

```
-------------------------------------------------------------------------------
 Jupyter Notebooks      65            0            0            0            0
 |- Markdown            64         3475           12         2507          956
 |- Python              65         9700         7362         1101         1237
 (Total)                          13175         7374         3608         2193
===============================================================================
```

```console
$ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/openai-cookbook/**/*.ipynb --fix
error: Failed to parse /path/to/openai-cookbook/examples/vector_databases/Using_vector_databases_for_embeddings_search.ipynb:cell 4:29:18: unexpected token '-'
...
Found 4227 errors (2165 fixed, 2062 remaining).
```

### https://github.com/tensorflow/docs

```
-------------------------------------------------------------------------------
 Jupyter Notebooks     150            0            0            0            0
 |- Markdown             1           55            0           46            9
 |- Python               1          402          289           60           53
 (Total)                            457          289          106           62
-------------------------------------------------------------------------------
```

```console
$ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/tensorflow-docs/**/*.ipynb --fix
error: Failed to parse /path/to/tensorflow-docs/site/en/guide/extension_type.ipynb:cell 80:1:1: unexpected token Indent
error: Failed to parse /path/to/tensorflow-docs/site/en/r1/tutorials/eager/custom_layers.ipynb:cell 20:1:1: unexpected token Indent
error: Failed to parse /path/to/tensorflow-docs/site/en/guide/data.ipynb:cell 175:5:14: unindent does not match any outer indentation level
error: Failed to parse /path/to/tensorflow-docs/site/en/r1/tutorials/representation/unicode.ipynb:cell 30:1:1: unexpected token Indent
...
Found 12726 errors (5140 fixed, 7586 remaining).
```

### https://github.com/tensorflow/models

```
-------------------------------------------------------------------------------
 Jupyter Notebooks      46            0            0            0            0
 |- Markdown             1           11            0            6            5
 |- Python               1          328          249           19           60
 (Total)                            339          249           25           65
-------------------------------------------------------------------------------
```

```console
$ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/tensorflow-models/**/*.ipynb --fix
...
Found 4856 errors (2690 fixed, 2166 remaining).
```

resolves: #1218
fixes: #4556
2023-06-12 14:14:15 +00:00
Charlie Marsh
68b6d30c46
Use consistent Cargo.toml metadata in all crates (#5015) 2023-06-12 00:02:40 +00:00
Ryan Yang
ab3c02342b
Implement copyright notice detection (#4701)
## Summary

Add copyright notice detection to enforce the presence of copyright
headers in Python files.

Configurable settings include: the relevant regular expression, the
author name, and the minimum file size, similar to
[flake8-copyright](https://github.com/savoirfairelinux/flake8-copyright).

Closes https://github.com/charliermarsh/ruff/issues/3579

---------

Signed-off-by: ryan <ryang@waabi.ai>
Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
2023-06-11 02:17:58 +00:00
Jonathan Plasse
edadd7814f
Add pyflakes.extend-generics setting (#4677) 2023-06-01 22:19:37 +00:00
Charlie Marsh
a4f73ea8c7
Remove unused getrandom dependency (#4734) 2023-05-30 14:34:20 -04:00
Jonathan Plasse
4233f6ec91
Update to the new rule architecture (#4589) 2023-05-24 11:30:40 -04:00
Jonathan Plasse
c6a760e298
Introduce tab-size to correcly calculate the line length with tabulations (#4167) 2023-05-24 08:37:24 +02:00
Aaron Cunningham
41a681531d
Support new extend-per-file-ignores setting (#4265) 2023-05-19 12:24:04 -04:00
Charlie Marsh
15cb21a6f4
Implement --extend-fixable option (#4297) 2023-05-18 22:20:19 -04:00
Charlie Marsh
a69451ff46
[pyupgrade] Remove keep-runtime-typing setting (#4427) 2023-05-14 03:12:52 +00:00
Jonathan Plasse
c10a4535b9
Disallow unreachable_pub (#4314) 2023-05-11 18:00:00 -04:00
Micha Reiser
8969ad5879
Always generate fixes (#4239) 2023-05-10 07:06:14 +00:00
Charlie Marsh
a435c0df4b
Remove deprecated update-check setting (#4313) 2023-05-09 13:10:02 -04:00
Zanie Adkins
0801f14046
Refactor Fix and Edit API (#4198) 2023-05-08 11:57:03 +02:00
Micha Reiser
cab65b25da
Replace row/column based Location with byte-offsets. (#3931) 2023-04-26 18:11:02 +00:00
Micha Reiser
ba4f4f4672
Upgrade dependencies (#4064) 2023-04-22 18:04:01 +01:00
Charlie Marsh
b999e4b1e2
Allow users to extend the set of included files via include (#3914) 2023-04-11 23:39:43 -04:00
Micha Reiser
9209e57c5a
Extract message emitters from Printer (#3895) 2023-04-11 07:24:25 +00:00
Chris Chan
10504eb9ed
Generate ImportMap from module path to imported dependencies (#3243) 2023-04-04 03:31:37 +00:00
Leiser Fernández Gallo
224e85c6d7
Implement flake8-gettext (#3785) 2023-03-28 23:32:02 +00:00
Micha Reiser
f68c26a506
perf(pycodestyle): Initialize Stylist from tokens (#3757) 2023-03-28 11:53:35 +02:00
Charlie Marsh
c3917eab38
Revert "Implement flake8-i18n (#3741)" (#3765) 2023-03-27 21:14:38 +00:00
Leiser Fernández Gallo
5cb120327c
Implement flake8-i18n (#3741) 2023-03-27 18:03:39 +00:00
Charlie Marsh
e603382cf0
Allow diagnostics to generate multi-edit fixes (#3709) 2023-03-26 16:45:19 -04:00
Charlie Marsh
27903cdb11
Replace logical_lines feature with debug_assertions (#3648) 2023-03-21 12:16:41 -04:00
Micha Reiser
bd05a8a74d
fix: WASM tests (#3415) 2023-03-09 11:27:59 +01:00
Micha Reiser
229f1c34cb
refactor: Extract ruff_wasm (#3401) 2023-03-09 10:07:39 +00:00