Add support for auto-fix in Jupyter notebooks (#4665)

## Summary

Add support for applying auto-fixes in Jupyter Notebook.

### Solution

Cell offsets are the boundaries for each cell in the concatenated source
code. They are represented using `TextSize`. It includes the start and
end offset as well, thus creating a range for each cell. These offsets
are updated using the `SourceMap` markers.

### SourceMap

`SourceMap` contains markers constructed from each edits which tracks
the original source code position to the transformed positions. The
following drawing might make it clear:

![SourceMap visualization](3c94e591-70a7-4b57-bd32-0baa91cc7858)

The center column where the dotted lines are present are the markers
included in the `SourceMap`. The `Notebook` looks at these markers and
updates the cell offsets after each linter loop. If you notice closely,
the destination takes into account all of the markers before it.

The index is constructed only when required as it's only used to render
the diagnostics. So, a `OnceCell` is used for this purpose. The cell
offsets, cell content and the index will be updated after each iteration
of linting in the mentioned order. The order is important here as the
content is updated as per the new offsets and index is updated as per
the new content.

## Limitations

### 1

Styling rules such as the ones in `pycodestyle` will not be applicable
everywhere in Jupyter notebook, especially at the cell boundaries. Let's
take an example where a rule suggests to have 2 blank lines before a
function and the cells contains the following code:

```python
import something
# ---
def first():
	pass

def second():
	pass
```

(Again, the comment is only to visualize cell boundaries.)

In the concatenated source code, the 2 blank lines will be added but it
shouldn't actually be added when we look in terms of Jupyter notebook.
It's as if the function `first` is at the start of a file.

`nbqa` solves this by recording newlines before and after running
`autopep8`, then running the tool and restoring the newlines at the end
(refer https://github.com/nbQA-dev/nbQA/pull/807).

## Test Plan

Three commands were run in order with common flags (`--select=ALL
--no-cache --isolated`) to isolate which stage the problem is occurring:
1. Only diagnostics
2. Fix with diff (`--fix --diff`)
3. Fix (`--fix`)

### https://github.com/facebookresearch/segment-anything

```
-------------------------------------------------------------------------------
 Jupyter Notebooks       3            0            0            0            0
 |- Markdown             3           98            0           94            4
 |- Python               3          513          468            4           41
 (Total)                            611          468           98           45
-------------------------------------------------------------------------------
```

```console
$ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/segment-anything/**/*.ipynb --fix
...
Found 180 errors (89 fixed, 91 remaining).
```

### https://github.com/openai/openai-cookbook

```
-------------------------------------------------------------------------------
 Jupyter Notebooks      65            0            0            0            0
 |- Markdown            64         3475           12         2507          956
 |- Python              65         9700         7362         1101         1237
 (Total)                          13175         7374         3608         2193
===============================================================================
```

```console
$ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/openai-cookbook/**/*.ipynb --fix
error: Failed to parse /path/to/openai-cookbook/examples/vector_databases/Using_vector_databases_for_embeddings_search.ipynb:cell 4:29:18: unexpected token '-'
...
Found 4227 errors (2165 fixed, 2062 remaining).
```

### https://github.com/tensorflow/docs

```
-------------------------------------------------------------------------------
 Jupyter Notebooks     150            0            0            0            0
 |- Markdown             1           55            0           46            9
 |- Python               1          402          289           60           53
 (Total)                            457          289          106           62
-------------------------------------------------------------------------------
```

```console
$ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/tensorflow-docs/**/*.ipynb --fix
error: Failed to parse /path/to/tensorflow-docs/site/en/guide/extension_type.ipynb:cell 80:1:1: unexpected token Indent
error: Failed to parse /path/to/tensorflow-docs/site/en/r1/tutorials/eager/custom_layers.ipynb:cell 20:1:1: unexpected token Indent
error: Failed to parse /path/to/tensorflow-docs/site/en/guide/data.ipynb:cell 175:5:14: unindent does not match any outer indentation level
error: Failed to parse /path/to/tensorflow-docs/site/en/r1/tutorials/representation/unicode.ipynb:cell 30:1:1: unexpected token Indent
...
Found 12726 errors (5140 fixed, 7586 remaining).
```

### https://github.com/tensorflow/models

```
-------------------------------------------------------------------------------
 Jupyter Notebooks      46            0            0            0            0
 |- Markdown             1           11            0            6            5
 |- Python               1          328          249           19           60
 (Total)                            339          249           25           65
-------------------------------------------------------------------------------
```

```console
$ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/tensorflow-models/**/*.ipynb --fix
...
Found 4856 errors (2690 fixed, 2166 remaining).
```

resolves: #1218
fixes: #4556
This commit is contained in:
Dhruv Manilawala 2023-06-12 19:44:15 +05:30 committed by GitHub
parent e586c27590
commit d8f5d2d767
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
27 changed files with 1022 additions and 234 deletions

114
Cargo.lock generated
View file

@ -171,6 +171,12 @@ version = "0.13.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9e1b586273c5702936fe7b7d6896644d8be71e6314cfe09d3167c95f712589e8"
[[package]]
name = "base64"
version = "0.21.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "604178f6c5c21f02dc555784810edfb88d34ac2c73b2eae109655649ee73ce3d"
[[package]]
name = "bincode"
version = "1.3.3"
@ -256,7 +262,8 @@ dependencies = [
"iana-time-zone",
"js-sys",
"num-traits",
"time",
"serde",
"time 0.1.45",
"wasm-bindgen",
"winapi",
]
@ -555,6 +562,41 @@ dependencies = [
"syn 1.0.109",
]
[[package]]
name = "darling"
version = "0.20.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0558d22a7b463ed0241e993f76f09f30b126687447751a8638587b864e4b3944"
dependencies = [
"darling_core",
"darling_macro",
]
[[package]]
name = "darling_core"
version = "0.20.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ab8bfa2e259f8ee1ce5e97824a3c55ec4404a0d772ca7fa96bf19f0752a046eb"
dependencies = [
"fnv",
"ident_case",
"proc-macro2",
"quote",
"strsim",
"syn 2.0.18",
]
[[package]]
name = "darling_macro"
version = "0.20.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "29a358ff9f12ec09c3e61fef9b5a9902623a695a46a917b07f269bff1445611a"
dependencies = [
"darling_core",
"quote",
"syn 2.0.18",
]
[[package]]
name = "diff"
version = "0.1.13"
@ -814,6 +856,12 @@ version = "0.3.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fed44880c466736ef9a5c5b5facefb5ed0785676d0c02d612db14e54f0d84286"
[[package]]
name = "hex"
version = "0.4.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7f24254aa9a54b5c858eaee2f5bccdb46aaf0e486a595ed5fd8f86ba55232a70"
[[package]]
name = "hexf-parse"
version = "0.2.1"
@ -843,6 +891,12 @@ dependencies = [
"cc",
]
[[package]]
name = "ident_case"
version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b9e0384b61958566e926dc50660321d12159025e767c18e043daf26b70104c39"
[[package]]
name = "idna"
version = "0.3.0"
@ -1789,6 +1843,7 @@ dependencies = [
"semver",
"serde",
"serde_json",
"serde_with",
"shellexpand",
"similar",
"smallvec",
@ -2323,6 +2378,34 @@ dependencies = [
"serde",
]
[[package]]
name = "serde_with"
version = "3.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9f02d8aa6e3c385bf084924f660ce2a3a6bd333ba55b35e8590b321f35d88513"
dependencies = [
"base64 0.21.2",
"chrono",
"hex",
"indexmap",
"serde",
"serde_json",
"serde_with_macros",
"time 0.3.21",
]
[[package]]
name = "serde_with_macros"
version = "3.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "edc7d5d3932fb12ce722ee5e64dd38c504efba37567f0c402f6ca728c3b8b070"
dependencies = [
"darling",
"proc-macro2",
"quote",
"syn 2.0.18",
]
[[package]]
name = "shellexpand"
version = "3.1.0"
@ -2549,6 +2632,33 @@ dependencies = [
"winapi",
]
[[package]]
name = "time"
version = "0.3.21"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8f3403384eaacbca9923fa06940178ac13e4edb725486d70e8e15881d0c836cc"
dependencies = [
"itoa",
"serde",
"time-core",
"time-macros",
]
[[package]]
name = "time-core"
version = "0.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7300fbefb4dadc1af235a9cef3737cea692a9d97e1b9cbcd4ebdae6f8868e6fb"
[[package]]
name = "time-macros"
version = "0.2.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "372950940a5f07bf38dbe211d7283c9e6d7327df53794992d293e534c733d09b"
dependencies = [
"time-core",
]
[[package]]
name = "tiny-keccak"
version = "2.0.2"
@ -2767,7 +2877,7 @@ version = "2.6.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "338b31dd1314f68f3aabf3ed57ab922df95ffcd902476ca7ba3c4ce7b908c46d"
dependencies = [
"base64",
"base64 0.13.1",
"flate2",
"log",
"once_cell",

View file

@ -65,6 +65,7 @@ schemars = { workspace = true, optional = true }
semver = { version = "1.0.16" }
serde = { workspace = true }
serde_json = { workspace = true }
serde_with = { version = "3.0.0" }
similar = { workspace = true, features = ["inline"] }
shellexpand = { workspace = true }
smallvec = { workspace = true }

View file

@ -0,0 +1,5 @@
{
"cell_type": "code",
"metadata": {},
"source": ["def foo():\n", " pass\n", "\n", "%timeit foo()"]
}

View file

@ -0,0 +1,5 @@
{
"cell_type": "markdown",
"metadata": {},
"source": ["This is a markdown cell\n", "Some more content"]
}

View file

@ -0,0 +1,5 @@
{
"cell_type": "code",
"metadata": {},
"source": ["def foo():\n", " pass"]
}

View file

@ -0,0 +1,5 @@
{
"cell_type": "code",
"metadata": {},
"source": "%timeit print('hello world')"
}

View file

@ -3,6 +3,16 @@
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2023-03-08T23:01:09.782916Z",
"start_time": "2023-03-08T23:01:09.705831Z"
},
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
@ -19,32 +29,26 @@
" print(f\"cell one: {y}\")\n",
"\n",
"unused_variable()"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"start_time": "2023-03-08T23:01:09.705831Z",
"end_time": "2023-03-08T23:01:09.782916Z"
}
}
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's do another mistake"
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true,
"ExecuteTime": {
"start_time": "2023-03-08T23:01:09.733809Z",
"end_time": "2023-03-08T23:01:09.915760Z"
"end_time": "2023-03-08T23:01:09.915760Z",
"start_time": "2023-03-08T23:01:09.733809Z"
},
"collapsed": true,
"jupyter": {
"outputs_hidden": true
}
},
"outputs": [
@ -62,27 +66,66 @@
"\n",
"mutable_argument()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's create an empty cell"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Multi-line empty cell!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"after empty cells\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python (ruff)",
"language": "python",
"name": "python3"
"name": "ruff"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 4
}

View file

@ -2,23 +2,31 @@ use std::collections::BTreeSet;
use itertools::Itertools;
use nohash_hasher::IntSet;
use ruff_text_size::{TextRange, TextSize};
use ruff_text_size::{TextLen, TextRange, TextSize};
use rustc_hash::FxHashMap;
use ruff_diagnostics::{Diagnostic, Edit, Fix, IsolationLevel};
use ruff_python_ast::source_code::Locator;
use crate::autofix::source_map::SourceMap;
use crate::linter::FixTable;
use crate::registry::{AsRule, Rule};
pub(crate) mod codemods;
pub(crate) mod edits;
pub(crate) mod source_map;
pub(crate) struct FixResult {
/// The resulting source code, after applying all fixes.
pub(crate) code: String,
/// The number of fixes applied for each [`Rule`].
pub(crate) fixes: FixTable,
/// Source map for the fixed source code.
pub(crate) source_map: SourceMap,
}
/// Auto-fix errors in a file, and write the fixed source code to disk.
pub(crate) fn fix_file(
diagnostics: &[Diagnostic],
locator: &Locator,
) -> Option<(String, FixTable)> {
pub(crate) fn fix_file(diagnostics: &[Diagnostic], locator: &Locator) -> Option<FixResult> {
let mut with_fixes = diagnostics
.iter()
.filter(|diag| diag.fix.is_some())
@ -35,12 +43,13 @@ pub(crate) fn fix_file(
fn apply_fixes<'a>(
diagnostics: impl Iterator<Item = &'a Diagnostic>,
locator: &'a Locator<'a>,
) -> (String, FixTable) {
) -> FixResult {
let mut output = String::with_capacity(locator.len());
let mut last_pos: Option<TextSize> = None;
let mut applied: BTreeSet<&Edit> = BTreeSet::default();
let mut isolated: IntSet<u32> = IntSet::default();
let mut fixed = FxHashMap::default();
let mut source_map = SourceMap::default();
for (rule, fix) in diagnostics
.filter_map(|diagnostic| {
@ -84,9 +93,15 @@ fn apply_fixes<'a>(
let slice = locator.slice(TextRange::new(last_pos.unwrap_or_default(), edit.start()));
output.push_str(slice);
// Add the start source marker for the patch.
source_map.push_start_marker(edit, output.text_len());
// Add the patch itself.
output.push_str(edit.content().unwrap_or_default());
// Add the end source marker for the added patch.
source_map.push_end_marker(edit, output.text_len());
// Track that the edit was applied.
last_pos = Some(edit.end());
applied.insert(edit);
@ -99,7 +114,11 @@ fn apply_fixes<'a>(
let slice = locator.after(last_pos.unwrap_or_default());
output.push_str(slice);
(output, fixed)
FixResult {
code: output,
fixes: fixed,
source_map,
}
}
/// Compare two fixes.
@ -130,7 +149,8 @@ mod tests {
use ruff_diagnostics::Fix;
use ruff_python_ast::source_code::Locator;
use crate::autofix::apply_fixes;
use crate::autofix::source_map::SourceMarker;
use crate::autofix::{apply_fixes, FixResult};
use crate::rules::pycodestyle::rules::MissingNewlineAtEndOfFile;
#[allow(deprecated)]
@ -150,9 +170,59 @@ mod tests {
fn empty_file() {
let locator = Locator::new(r#""#);
let diagnostics = create_diagnostics([]);
let (contents, fixed) = apply_fixes(diagnostics.iter(), &locator);
assert_eq!(contents, "");
assert_eq!(fixed.values().sum::<usize>(), 0);
let FixResult {
code,
fixes,
source_map,
} = apply_fixes(diagnostics.iter(), &locator);
assert_eq!(code, "");
assert_eq!(fixes.values().sum::<usize>(), 0);
assert!(source_map.markers().is_empty());
}
#[test]
fn apply_one_insertion() {
let locator = Locator::new(
r#"
import os
print("hello world")
"#
.trim(),
);
let diagnostics = create_diagnostics([Edit::insertion(
"import sys\n".to_string(),
TextSize::new(10),
)]);
let FixResult {
code,
fixes,
source_map,
} = apply_fixes(diagnostics.iter(), &locator);
assert_eq!(
code,
r#"
import os
import sys
print("hello world")
"#
.trim()
);
assert_eq!(fixes.values().sum::<usize>(), 1);
assert_eq!(
source_map.markers(),
&[
SourceMarker {
source: 10.into(),
dest: 10.into(),
},
SourceMarker {
source: 10.into(),
dest: 21.into(),
},
]
);
}
#[test]
@ -169,16 +239,33 @@ class A(object):
TextSize::new(8),
TextSize::new(14),
)]);
let (contents, fixed) = apply_fixes(diagnostics.iter(), &locator);
let FixResult {
code,
fixes,
source_map,
} = apply_fixes(diagnostics.iter(), &locator);
assert_eq!(
contents,
code,
r#"
class A(Bar):
...
"#
.trim(),
);
assert_eq!(fixed.values().sum::<usize>(), 1);
assert_eq!(fixes.values().sum::<usize>(), 1);
assert_eq!(
source_map.markers(),
&[
SourceMarker {
source: 8.into(),
dest: 8.into(),
},
SourceMarker {
source: 14.into(),
dest: 11.into(),
},
]
);
}
#[test]
@ -191,16 +278,33 @@ class A(object):
.trim(),
);
let diagnostics = create_diagnostics([Edit::deletion(TextSize::new(7), TextSize::new(15))]);
let (contents, fixed) = apply_fixes(diagnostics.iter(), &locator);
let FixResult {
code,
fixes,
source_map,
} = apply_fixes(diagnostics.iter(), &locator);
assert_eq!(
contents,
code,
r#"
class A:
...
"#
.trim()
);
assert_eq!(fixed.values().sum::<usize>(), 1);
assert_eq!(fixes.values().sum::<usize>(), 1);
assert_eq!(
source_map.markers(),
&[
SourceMarker {
source: 7.into(),
dest: 7.into()
},
SourceMarker {
source: 15.into(),
dest: 7.into()
}
]
);
}
#[test]
@ -216,17 +320,42 @@ class A(object, object, object):
Edit::deletion(TextSize::from(8), TextSize::from(16)),
Edit::deletion(TextSize::from(22), TextSize::from(30)),
]);
let (contents, fixed) = apply_fixes(diagnostics.iter(), &locator);
let FixResult {
code,
fixes,
source_map,
} = apply_fixes(diagnostics.iter(), &locator);
assert_eq!(
contents,
code,
r#"
class A(object):
...
"#
.trim()
);
assert_eq!(fixed.values().sum::<usize>(), 2);
assert_eq!(fixes.values().sum::<usize>(), 2);
assert_eq!(
source_map.markers(),
&[
SourceMarker {
source: 8.into(),
dest: 8.into()
},
SourceMarker {
source: 16.into(),
dest: 8.into()
},
SourceMarker {
source: 22.into(),
dest: 14.into(),
},
SourceMarker {
source: 30.into(),
dest: 14.into(),
}
]
);
}
#[test]
@ -242,15 +371,32 @@ class A(object):
Edit::deletion(TextSize::from(7), TextSize::from(15)),
Edit::replacement("ignored".to_string(), TextSize::from(9), TextSize::from(11)),
]);
let (contents, fixed) = apply_fixes(diagnostics.iter(), &locator);
let FixResult {
code,
fixes,
source_map,
} = apply_fixes(diagnostics.iter(), &locator);
assert_eq!(
contents,
code,
r#"
class A:
...
"#
.trim(),
);
assert_eq!(fixed.values().sum::<usize>(), 1);
assert_eq!(fixes.values().sum::<usize>(), 1);
assert_eq!(
source_map.markers(),
&[
SourceMarker {
source: 7.into(),
dest: 7.into(),
},
SourceMarker {
source: 15.into(),
dest: 7.into(),
}
]
);
}
}

View file

@ -0,0 +1,59 @@
use ruff_text_size::TextSize;
use ruff_diagnostics::Edit;
/// Lightweight sourcemap marker representing the source and destination
/// position for an [`Edit`].
#[derive(Debug, PartialEq, Eq)]
pub(crate) struct SourceMarker {
/// Position of the marker in the original source.
pub(crate) source: TextSize,
/// Position of the marker in the transformed code.
pub(crate) dest: TextSize,
}
/// A collection of [`SourceMarker`].
///
/// Sourcemaps are used to map positions in the original source to positions in
/// the transformed code. Here, only the boundaries of edits are tracked instead
/// of every single character.
#[derive(Default, PartialEq, Eq)]
pub(crate) struct SourceMap(Vec<SourceMarker>);
impl SourceMap {
/// Returns a slice of all the markers in the sourcemap in the order they
/// were added.
pub(crate) fn markers(&self) -> &[SourceMarker] {
&self.0
}
/// Push the start marker for an [`Edit`].
///
/// The `output_length` is the length of the transformed string before the
/// edit is applied.
pub(crate) fn push_start_marker(&mut self, edit: &Edit, output_length: TextSize) {
self.0.push(SourceMarker {
source: edit.start(),
dest: output_length,
});
}
/// Push the end marker for an [`Edit`].
///
/// The `output_length` is the length of the transformed string after the
/// edit has been applied.
pub(crate) fn push_end_marker(&mut self, edit: &Edit, output_length: TextSize) {
if edit.is_insertion() {
self.0.push(SourceMarker {
source: edit.start(),
dest: output_length,
});
} else {
// Deletion or replacement
self.0.push(SourceMarker {
source: edit.end(),
dest: output_length,
});
}
}
}

View file

@ -16,6 +16,7 @@ use crate::registry::Rule;
use crate::rules::isort;
use crate::rules::isort::block::{Block, BlockBuilder};
use crate::settings::Settings;
use crate::source_kind::SourceKind;
fn extract_import_map(path: &Path, package: Option<&Path>, blocks: &[&Block]) -> Option<ImportMap> {
let Some(package) = package else {
@ -83,12 +84,13 @@ pub(crate) fn check_imports(
stylist: &Stylist,
path: &Path,
package: Option<&Path>,
source_kind: Option<&SourceKind>,
) -> (Vec<Diagnostic>, Option<ImportMap>) {
let is_stub = is_python_stub_file(path);
// Extract all import blocks from the AST.
let tracker = {
let mut tracker = BlockBuilder::new(locator, directives, is_stub);
let mut tracker = BlockBuilder::new(locator, directives, is_stub, source_kind);
tracker.visit_body(python_ast);
tracker
};

View file

@ -0,0 +1,24 @@
/// Jupyter Notebook indexing table
///
/// When we lint a jupyter notebook, we have to translate the row/column based on
/// [`ruff_text_size::TextSize`] to jupyter notebook cell/row/column.
#[derive(Clone, Debug, Eq, PartialEq)]
pub struct JupyterIndex {
/// Enter a row (1-based), get back the cell (1-based)
pub(super) row_to_cell: Vec<u32>,
/// Enter a row (1-based), get back the row in cell (1-based)
pub(super) row_to_row_in_cell: Vec<u32>,
}
impl JupyterIndex {
/// Returns the cell number (1-based) for the given row (1-based).
pub fn cell(&self, row: usize) -> Option<u32> {
self.row_to_cell.get(row).copied()
}
/// Returns the row number (1-based) in the cell (1-based) for the
/// given row (1-based).
pub fn cell_row(&self, row: usize) -> Option<u32> {
self.row_to_row_in_cell.get(row).copied()
}
}

View file

@ -1,7 +1,9 @@
//! Utils for reading and writing jupyter notebooks
pub use index::*;
pub use notebook::*;
pub use schema::*;
mod index;
mod notebook;
mod schema;

View file

@ -1,32 +1,27 @@
use std::cmp::Ordering;
use std::fs::File;
use std::io::{BufReader, BufWriter};
use std::iter;
use std::path::Path;
use ruff_text_size::TextRange;
use itertools::Itertools;
use once_cell::sync::OnceCell;
use serde::Serialize;
use serde_json::error::Category;
use ruff_diagnostics::Diagnostic;
use ruff_python_whitespace::NewlineWithTrailingNewline;
use ruff_text_size::{TextRange, TextSize};
use crate::jupyter::{CellType, JupyterNotebook, SourceValue};
use crate::autofix::source_map::{SourceMap, SourceMarker};
use crate::jupyter::index::JupyterIndex;
use crate::jupyter::{Cell, CellType, RawNotebook, SourceValue};
use crate::rules::pycodestyle::rules::SyntaxError;
use crate::IOError;
pub const JUPYTER_NOTEBOOK_EXT: &str = "ipynb";
/// Jupyter Notebook indexing table
///
/// When we lint a jupyter notebook, we have to translate the row/column based on
/// [`ruff_text_size::TextSize`]
/// to jupyter notebook cell/row/column.
#[derive(Debug, Eq, PartialEq)]
pub struct JupyterIndex {
/// Enter a row (1-based), get back the cell (1-based)
pub row_to_cell: Vec<u32>,
/// Enter a row (1-based), get back the cell (1-based)
pub row_to_row_in_cell: Vec<u32>,
}
const MAGIC_PREFIX: [&str; 3] = ["%", "!", "?"];
/// Return `true` if the [`Path`] appears to be that of a jupyter notebook file (`.ipynb`).
pub fn is_jupyter_notebook(path: &Path) -> bool {
@ -37,7 +32,55 @@ pub fn is_jupyter_notebook(path: &Path) -> bool {
&& cfg!(feature = "jupyter_notebook")
}
impl JupyterNotebook {
impl Cell {
/// Return `true` if it's a valid code cell.
///
/// A valid code cell is a cell where the type is [`CellType::Code`] and the
/// source doesn't contain a magic, shell or help command.
fn is_valid_code_cell(&self) -> bool {
if self.cell_type != CellType::Code {
return false;
}
// Ignore a cell if it contains a magic command. There could be valid
// Python code as well, but we'll ignore that for now.
// TODO(dhruvmanila): https://github.com/psf/black/blob/main/src/black/handle_ipynb_magics.py
!match &self.source {
SourceValue::String(string) => string.lines().any(|line| {
MAGIC_PREFIX
.iter()
.any(|prefix| line.trim_start().starts_with(prefix))
}),
SourceValue::StringArray(string_array) => string_array.iter().any(|line| {
MAGIC_PREFIX
.iter()
.any(|prefix| line.trim_start().starts_with(prefix))
}),
}
}
}
#[derive(Debug, PartialEq)]
pub struct Notebook {
/// Python source code of the notebook.
///
/// This is the concatenation of all valid code cells in the notebook
/// separated by a newline and a trailing newline. The trailing newline
/// is added to make sure that each cell ends with a newline which will
/// be removed when updating the cell content.
content: String,
/// The index of the notebook. This is used to map between the concatenated
/// source code and the original notebook.
index: OnceCell<JupyterIndex>,
/// The raw notebook i.e., the deserialized version of JSON string.
raw: RawNotebook,
/// The offsets of each cell in the concatenated source code. This includes
/// the first and last character offsets as well.
cell_offsets: Vec<TextSize>,
/// The cell numbers of all valid code cells in the notebook.
valid_code_cells: Vec<u32>,
}
impl Notebook {
/// See also the black implementation
/// <https://github.com/psf/black/blob/69ca0a4c7a365c5f5eea519a90980bab72cab764/src/black/__init__.py#L1017-L1046>
pub fn read(path: &Path) -> Result<Self, Box<Diagnostic>> {
@ -49,7 +92,7 @@ impl JupyterNotebook {
TextRange::default(),
)
})?);
let notebook: JupyterNotebook = match serde_json::from_reader(reader) {
let notebook: RawNotebook = match serde_json::from_reader(reader) {
Ok(notebook) => notebook,
Err(err) => {
// Translate the error into a diagnostic
@ -130,60 +173,201 @@ impl JupyterNotebook {
)));
}
Ok(notebook)
}
/// Concatenates all cells into a single virtual file and builds an index that maps the content
/// to notebook cell locations
pub fn index(&self) -> (String, JupyterIndex) {
let mut jupyter_index = JupyterIndex {
// Enter a line number (1-based), get back the cell (1-based)
// 0 index is just padding
row_to_cell: vec![0],
// Enter a line number (1-based), get back the row number in the cell (1-based)
// 0 index is just padding
row_to_row_in_cell: vec![0],
};
let size_hint = self
.cells
.iter()
.filter(|cell| cell.cell_type == CellType::Code)
.count();
let mut contents = Vec::with_capacity(size_hint);
for (pos, cell) in self
let valid_code_cells = notebook
.cells
.iter()
.enumerate()
.filter(|(_pos, cell)| cell.cell_type == CellType::Code)
{
let cell_contents = match &cell.source {
SourceValue::String(string) => {
// TODO(konstin): is or isn't there a trailing newline per cell?
// i've only seen these as array and never as string
let line_count = u32::try_from(string.lines().count()).unwrap();
jupyter_index.row_to_cell.extend(
iter::repeat(u32::try_from(pos + 1).unwrap()).take(line_count as usize),
);
jupyter_index.row_to_row_in_cell.extend(1..=line_count);
string.clone()
}
SourceValue::StringArray(string_array) => {
jupyter_index.row_to_cell.extend(
iter::repeat(u32::try_from(pos + 1).unwrap()).take(string_array.len()),
);
jupyter_index
.row_to_row_in_cell
.extend(1..=u32::try_from(string_array.len()).unwrap());
// lines already end in a newline character
string_array.join("")
.filter(|(_, cell)| cell.is_valid_code_cell())
.map(|(pos, _)| u32::try_from(pos).unwrap())
.collect::<Vec<_>>();
let mut contents = Vec::with_capacity(valid_code_cells.len());
let mut current_offset = TextSize::from(0);
let mut cell_offsets = Vec::with_capacity(notebook.cells.len());
cell_offsets.push(TextSize::from(0));
for &pos in &valid_code_cells {
let cell_contents = match &notebook.cells[pos as usize].source {
SourceValue::String(string) => string.clone(),
SourceValue::StringArray(string_array) => string_array.join(""),
};
current_offset += TextSize::of(&cell_contents) + TextSize::new(1);
contents.push(cell_contents);
cell_offsets.push(current_offset);
}
Ok(Self {
raw: notebook,
index: OnceCell::new(),
// The additional newline at the end is to maintain consistency for
// all cells. These newlines will be removed before updating the
// source code with the transformed content. Refer `update_cell_content`.
content: contents.join("\n") + "\n",
cell_offsets,
valid_code_cells,
})
}
/// Update the cell offsets as per the given [`SourceMap`].
fn update_cell_offsets(&mut self, source_map: &SourceMap) {
// When there are multiple cells without any edits, the offsets of those
// cells will be updated using the same marker. So, we can keep track of
// the last marker used to update the offsets and check if it's still
// the closest marker to the current offset.
let mut last_marker: Option<&SourceMarker> = None;
// The first offset is always going to be at 0, so skip it.
for offset in self.cell_offsets.iter_mut().skip(1).rev() {
let closest_marker = match last_marker {
Some(marker) if marker.source <= *offset => marker,
_ => {
let Some(marker) = source_map
.markers()
.iter()
.rev()
.find(|m| m.source <= *offset) else {
// There are no markers above the current offset, so we can
// stop here.
break;
};
last_marker = Some(marker);
marker
}
};
contents.push(cell_contents);
match closest_marker.source.cmp(&closest_marker.dest) {
Ordering::Less => *offset += closest_marker.dest - closest_marker.source,
Ordering::Greater => *offset -= closest_marker.source - closest_marker.dest,
Ordering::Equal => (),
}
}
// The last line doesn't end in a newline character
(contents.join("\n"), jupyter_index)
}
/// Update the cell contents with the transformed content.
///
/// ## Panics
///
/// Panics if the transformed content is out of bounds for any cell. This
/// can happen only if the cell offsets were not updated before calling
/// this method or the offsets were updated incorrectly.
fn update_cell_content(&mut self, transformed: &str) {
for (&pos, (start, end)) in self
.valid_code_cells
.iter()
.zip(self.cell_offsets.iter().tuple_windows::<(_, _)>())
{
let cell_content = transformed
.get(start.to_usize()..end.to_usize())
.unwrap_or_else(|| {
panic!("Transformed content out of bounds ({start:?}..{end:?}) for cell {pos}");
});
self.raw.cells[pos as usize].source = SourceValue::String(
cell_content
// We only need to strip the trailing newline which we added
// while concatenating the cell contents.
.strip_suffix('\n')
.unwrap_or(cell_content)
.to_string(),
);
}
}
/// Build and return the [`JupyterIndex`].
///
/// # Notes
///
/// Empty cells don't have any newlines, but there's a single visible line
/// in the UI. That single line needs to be accounted for.
///
/// In case of [`SourceValue::StringArray`], newlines are part of the strings.
/// So, to get the actual count of lines, we need to check for any trailing
/// newline for the last line.
///
/// For example, consider the following cell:
/// ```python
/// [
/// "import os\n",
/// "import sys\n",
/// ]
/// ```
///
/// Here, the array suggests that there are two lines, but the actual number
/// of lines visible in the UI is three. The same goes for [`SourceValue::String`]
/// where we need to check for the trailing newline.
///
/// The index building is expensive as it needs to go through the content of
/// every valid code cell.
fn build_index(&self) -> JupyterIndex {
let mut row_to_cell = vec![0];
let mut row_to_row_in_cell = vec![0];
for &pos in &self.valid_code_cells {
let line_count = match &self.raw.cells[pos as usize].source {
SourceValue::String(string) => {
if string.is_empty() {
1
} else {
u32::try_from(NewlineWithTrailingNewline::from(string).count()).unwrap()
}
}
SourceValue::StringArray(string_array) => {
if string_array.is_empty() {
1
} else {
let trailing_newline =
usize::from(string_array.last().map_or(false, |s| s.ends_with('\n')));
u32::try_from(string_array.len() + trailing_newline).unwrap()
}
}
};
row_to_cell.extend(iter::repeat(pos + 1).take(line_count as usize));
row_to_row_in_cell.extend(1..=line_count);
}
JupyterIndex {
row_to_cell,
row_to_row_in_cell,
}
}
/// Return the notebook content.
///
/// This is the concatenation of all Python code cells.
pub(crate) fn content(&self) -> &str {
&self.content
}
/// Return the Jupyter notebook index.
///
/// The index is built only once when required. This is only used to
/// report diagnostics, so by that time all of the autofixes must have
/// been applied if `--fix` was passed.
pub(crate) fn index(&self) -> &JupyterIndex {
self.index.get_or_init(|| self.build_index())
}
/// Return the cell offsets for the concatenated source code corresponding
/// the Jupyter notebook.
pub(crate) fn cell_offsets(&self) -> &[TextSize] {
&self.cell_offsets
}
/// Update the notebook with the given sourcemap and transformed content.
pub(crate) fn update(&mut self, source_map: &SourceMap, transformed: &str) {
// Cell offsets must be updated before updating the cell content as
// it depends on the offsets to extract the cell content.
self.update_cell_offsets(source_map);
self.update_cell_content(transformed);
self.content = transformed.to_string();
}
/// Return `true` if the notebook is a Python notebook, `false` otherwise.
pub fn is_python_notebook(&self) -> bool {
self.raw
.metadata
.language_info
.as_ref()
.map_or(true, |language| language.name == "python")
}
/// Write back with an indent of 1, just like black
@ -192,7 +376,7 @@ impl JupyterNotebook {
// https://github.com/psf/black/blob/69ca0a4c7a365c5f5eea519a90980bab72cab764/src/black/__init__.py#LL1041
let formatter = serde_json::ser::PrettyFormatter::with_indent(b" ");
let mut ser = serde_json::Serializer::with_formatter(&mut writer, formatter);
self.serialize(&mut ser)?;
self.raw.serialize(&mut ser)?;
Ok(())
}
}
@ -201,28 +385,42 @@ impl JupyterNotebook {
mod test {
use std::path::Path;
use anyhow::Result;
use test_case::test_case;
use crate::jupyter::index::JupyterIndex;
#[cfg(feature = "jupyter_notebook")]
use crate::jupyter::is_jupyter_notebook;
use crate::jupyter::{JupyterIndex, JupyterNotebook};
use crate::jupyter::schema::Cell;
use crate::jupyter::Notebook;
use crate::test::test_resource_path;
/// Read a Jupyter cell from the `resources/test/fixtures/jupyter/cell` directory.
fn read_jupyter_cell(path: impl AsRef<Path>) -> Result<Cell> {
let path = test_resource_path("fixtures/jupyter/cell").join(path);
let contents = std::fs::read_to_string(path)?;
Ok(serde_json::from_str(&contents)?)
}
#[test]
fn test_valid() {
let path = Path::new("resources/test/fixtures/jupyter/valid.ipynb");
assert!(JupyterNotebook::read(path).is_ok());
assert!(Notebook::read(path).is_ok());
}
#[test]
fn test_r() {
// We can load this, it will be filtered out later
let path = Path::new("resources/test/fixtures/jupyter/R.ipynb");
assert!(JupyterNotebook::read(path).is_ok());
assert!(Notebook::read(path).is_ok());
}
#[test]
fn test_invalid() {
let path = Path::new("resources/test/fixtures/jupyter/invalid_extension.ipynb");
assert_eq!(
JupyterNotebook::read(path).unwrap_err().kind.body,
Notebook::read(path).unwrap_err().kind.body,
"SyntaxError: Expected a Jupyter Notebook (.ipynb extension), \
which must be internally stored as JSON, \
but found a Python source file: \
@ -230,19 +428,28 @@ mod test {
);
let path = Path::new("resources/test/fixtures/jupyter/not_json.ipynb");
assert_eq!(
JupyterNotebook::read(path).unwrap_err().kind.body,
Notebook::read(path).unwrap_err().kind.body,
"SyntaxError: A Jupyter Notebook (.ipynb) must internally be JSON, \
but this file isn't valid JSON: \
expected value at line 1 column 1"
);
let path = Path::new("resources/test/fixtures/jupyter/wrong_schema.ipynb");
assert_eq!(
JupyterNotebook::read(path).unwrap_err().kind.body,
Notebook::read(path).unwrap_err().kind.body,
"SyntaxError: This file does not match the schema expected of Jupyter Notebooks: \
missing field `cells` at line 1 column 2"
);
}
#[test_case(Path::new("markdown.json"), false; "markdown")]
#[test_case(Path::new("only_magic.json"), false; "only_magic")]
#[test_case(Path::new("code_and_magic.json"), false; "code_and_magic")]
#[test_case(Path::new("only_code.json"), true; "only_code")]
fn test_is_valid_code_cell(path: &Path, expected: bool) -> Result<()> {
assert_eq!(read_jupyter_cell(path)?.is_valid_code_cell(), expected);
Ok(())
}
#[test]
#[cfg(feature = "jupyter_notebook")]
fn inclusions() {
@ -256,10 +463,9 @@ mod test {
#[test]
fn test_concat_notebook() {
let path = Path::new("resources/test/fixtures/jupyter/valid.ipynb");
let notebook = JupyterNotebook::read(path).unwrap();
let (contents, index) = notebook.index();
let notebook = Notebook::read(path).unwrap();
assert_eq!(
contents,
notebook.content,
r#"def unused_variable():
x = 1
y = 2
@ -270,14 +476,30 @@ def mutable_argument(z=set()):
print(f"cell two: {z}")
mutable_argument()
print("after empty cells")
"#
);
assert_eq!(
index,
JupyterIndex {
row_to_cell: vec![0, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3],
row_to_row_in_cell: vec![0, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4],
notebook.index(),
&JupyterIndex {
row_to_cell: vec![0, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 5, 7, 7, 8],
row_to_row_in_cell: vec![0, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 1, 1, 2, 1],
}
);
assert_eq!(
notebook.cell_offsets(),
&[
0.into(),
90.into(),
168.into(),
169.into(),
171.into(),
198.into()
]
);
}
}

View file

@ -1,26 +1,33 @@
//! The JSON schema of a Jupyter Notebook, entrypoint is [`JupyterNotebook`]
//! The JSON schema of a Jupyter Notebook, entrypoint is [`RawNotebook`]
//!
//! Generated by <https://app.quicktype.io/> from
//! <https://github.com/jupyter/nbformat/blob/16b53251aabf472ad9406ddb1f78b0421c014eeb/nbformat/v4/nbformat.v4.schema.json>
//! Jupyter Notebook v4.5 JSON schema.
//!
//! The following changes were made to the generated version: `Cell::id` is optional because it
//! wasn't required <v4.5, `#[serde(deny_unknown_fields)]` was added where the schema had
//! `"additionalProperties": false` and `#[serde(flatten)] pub other: BTreeMap<String, Value>`
//! for `"additionalProperties": true` as preparation for round-trip support.
//! The following changes were made to the generated version:
//! * `Cell::id` is optional because it wasn't required <v4.5
//! * `#[serde(deny_unknown_fields)]` was added where the schema had
//! `"additionalProperties": false`
//! * `#[serde(flatten)] pub other: BTreeMap<String, Value>` for
//! `"additionalProperties": true` as preparation for round-trip support.
//! * `#[serde(skip_serializing_none)]` was added to all structs where one or
//! more fields were optional to avoid serializing `null` values.
//! * `Output::data` & `Cell::attachments` were changed to `Value` because
//! the scheme had `patternProperties`.
use std::collections::{BTreeMap, HashMap};
use serde::{Deserialize, Serialize};
use serde_json::Value;
use serde_with::skip_serializing_none;
/// The root of the JSON of a Jupyter Notebook
///
/// Generated by <https://app.quicktype.io/> from
/// <https://github.com/jupyter/nbformat/blob/16b53251aabf472ad9406ddb1f78b0421c014eeb/nbformat/v4/nbformat.v4.schema.json>
/// Jupyter Notebook v4.5 JSON schema.
#[derive(Debug, Serialize, Deserialize)]
pub struct JupyterNotebook {
#[derive(Debug, Serialize, Deserialize, PartialEq)]
pub struct RawNotebook {
/// Array of cells of the current notebook.
pub cells: Vec<Cell>,
/// Notebook root-level metadata.
@ -38,10 +45,11 @@ pub struct JupyterNotebook {
/// Notebook markdown cell.
///
/// Notebook code cell.
#[derive(Debug, Serialize, Deserialize)]
#[skip_serializing_none]
#[derive(Debug, Serialize, Deserialize, PartialEq)]
#[serde(deny_unknown_fields)]
pub struct Cell {
pub attachments: Option<HashMap<String, HashMap<String, SourceValue>>>,
pub attachments: Option<HashMap<String, HashMap<String, Value>>>,
/// String identifying the type of cell.
pub cell_type: CellType,
/// Technically, id isn't required (it's not even present) in schema v4.0 through v4.4, but
@ -58,7 +66,8 @@ pub struct Cell {
}
/// Cell-level metadata.
#[derive(Debug, Serialize, Deserialize)]
#[skip_serializing_none]
#[derive(Debug, Serialize, Deserialize, PartialEq)]
pub struct CellMetadata {
/// Raw cell metadata format for nbconvert.
pub format: Option<String>,
@ -84,7 +93,8 @@ pub struct CellMetadata {
/// Execution time for the code in the cell. This tracks time at which messages are received
/// from iopub or shell channels
#[derive(Debug, Serialize, Deserialize)]
#[skip_serializing_none]
#[derive(Debug, Serialize, Deserialize, PartialEq)]
#[serde(deny_unknown_fields)]
pub struct Execution {
/// header.date (in ISO 8601 format) of iopub channel's execute_input message. It indicates
@ -113,10 +123,11 @@ pub struct Execution {
/// Stream output from a code cell.
///
/// Output of an error that occurred during code cell execution.
#[derive(Debug, Serialize, Deserialize)]
#[skip_serializing_none]
#[derive(Debug, Serialize, Deserialize, PartialEq)]
#[serde(deny_unknown_fields)]
pub struct Output {
pub data: Option<HashMap<String, SourceValue>>,
pub data: Option<HashMap<String, Value>>,
/// A result's prompt number.
pub execution_count: Option<i64>,
pub metadata: Option<HashMap<String, Option<Value>>>,
@ -135,7 +146,8 @@ pub struct Output {
}
/// Notebook root-level metadata.
#[derive(Debug, Serialize, Deserialize)]
#[skip_serializing_none]
#[derive(Debug, Serialize, Deserialize, PartialEq)]
pub struct JupyterNotebookMetadata {
/// The author(s) of the notebook document
pub authors: Option<Vec<Option<Value>>>,
@ -154,7 +166,7 @@ pub struct JupyterNotebookMetadata {
}
/// Kernel information.
#[derive(Debug, Serialize, Deserialize)]
#[derive(Debug, Serialize, Deserialize, PartialEq)]
pub struct Kernelspec {
/// Name to display in UI.
pub display_name: String,
@ -166,7 +178,8 @@ pub struct Kernelspec {
}
/// Kernel information.
#[derive(Debug, Serialize, Deserialize)]
#[skip_serializing_none]
#[derive(Debug, Serialize, Deserialize, PartialEq)]
pub struct LanguageInfo {
/// The codemirror mode to use for code in this language.
pub codemirror_mode: Option<CodemirrorMode>,
@ -189,7 +202,7 @@ pub struct LanguageInfo {
/// Contents of the cell, represented as an array of lines.
///
/// The stream's text output, represented as an array of strings.
#[derive(Debug, Serialize, Deserialize)]
#[derive(Debug, Serialize, Deserialize, PartialEq)]
#[serde(untagged)]
pub enum SourceValue {
String(String),
@ -197,7 +210,7 @@ pub enum SourceValue {
}
/// Whether the cell's output is scrolled, unscrolled, or autoscrolled.
#[derive(Debug, Serialize, Deserialize)]
#[derive(Debug, Serialize, Deserialize, PartialEq)]
#[serde(untagged)]
pub enum ScrolledUnion {
Bool(bool),
@ -210,7 +223,7 @@ pub enum ScrolledUnion {
/// Contents of the cell, represented as an array of lines.
///
/// The stream's text output, represented as an array of strings.
#[derive(Debug, Serialize, Deserialize)]
#[derive(Debug, Serialize, Deserialize, PartialEq)]
#[serde(untagged)]
pub enum TextUnion {
String(String),
@ -218,7 +231,7 @@ pub enum TextUnion {
}
/// The codemirror mode to use for code in this language.
#[derive(Debug, Serialize, Deserialize)]
#[derive(Debug, Serialize, Deserialize, PartialEq)]
#[serde(untagged)]
pub enum CodemirrorMode {
AnythingMap(HashMap<String, Option<Value>>),
@ -236,14 +249,14 @@ pub enum CellType {
Raw,
}
#[derive(Debug, Serialize, Deserialize, Copy, Clone)]
#[derive(Debug, Serialize, Deserialize, Copy, Clone, PartialEq)]
pub enum ScrolledEnum {
#[serde(rename = "auto")]
Auto,
}
/// Type of cell output.
#[derive(Debug, Serialize, Deserialize, Copy, Clone)]
#[derive(Debug, Serialize, Deserialize, Copy, Clone, PartialEq)]
pub enum OutputType {
#[serde(rename = "display_data")]
DisplayData,

View file

@ -34,6 +34,7 @@ mod rule_redirects;
mod rule_selector;
pub mod rules;
pub mod settings;
pub mod source_kind;
#[cfg(any(test, fuzzing))]
pub mod test;

View file

@ -15,7 +15,7 @@ use ruff_python_ast::imports::ImportMap;
use ruff_python_ast::source_code::{Indexer, Locator, SourceFileBuilder, Stylist};
use ruff_python_stdlib::path::is_python_stub_file;
use crate::autofix::fix_file;
use crate::autofix::{fix_file, FixResult};
use crate::checkers::ast::check_ast;
use crate::checkers::filesystem::check_file_path;
use crate::checkers::imports::check_imports;
@ -30,6 +30,7 @@ use crate::noqa::add_noqa;
use crate::registry::{AsRule, Rule};
use crate::rules::pycodestyle;
use crate::settings::{flags, Settings};
use crate::source_kind::SourceKind;
use crate::{directives, fs};
const CARGO_PKG_NAME: &str = env!("CARGO_PKG_NAME");
@ -77,6 +78,7 @@ pub fn check_path(
directives: &Directives,
settings: &Settings,
noqa: flags::Noqa,
source_kind: Option<&SourceKind>,
) -> LinterResult<(Vec<Diagnostic>, Option<ImportMap>)> {
// Aggregate all diagnostics.
let mut diagnostics = vec![];
@ -157,6 +159,7 @@ pub fn check_path(
stylist,
path,
package,
source_kind,
);
imports = module_imports;
diagnostics.extend(import_diagnostics);
@ -285,11 +288,17 @@ pub fn add_noqa_to_path(path: &Path, package: Option<&Path>, settings: &Settings
&directives,
settings,
flags::Noqa::Disabled,
None,
);
// Log any parse errors.
if let Some(err) = error {
error!("{}", DisplayParseError::new(err, locator.to_source_code()));
// TODO(dhruvmanila): This should use `SourceKind`, update when
// `--add-noqa` is supported for Jupyter notebooks.
error!(
"{}",
DisplayParseError::new(err, locator.to_source_code(), None)
);
}
// Add any missing `# noqa` pragmas.
@ -343,6 +352,7 @@ pub fn lint_only(
&directives,
settings,
noqa,
None,
);
result.map(|(diagnostics, imports)| {
@ -388,6 +398,7 @@ pub fn lint_fix<'a>(
package: Option<&Path>,
noqa: flags::Noqa,
settings: &Settings,
source_kind: &mut SourceKind,
) -> Result<FixerResult<'a>> {
let mut transformed = Cow::Borrowed(contents);
@ -433,6 +444,7 @@ pub fn lint_fix<'a>(
&directives,
settings,
noqa,
Some(source_kind),
);
if iterations == 0 {
@ -453,13 +465,22 @@ pub fn lint_fix<'a>(
}
// Apply autofix.
if let Some((fixed_contents, applied)) = fix_file(&result.data.0, &locator) {
if let Some(FixResult {
code: fixed_contents,
fixes: applied,
source_map,
}) = fix_file(&result.data.0, &locator)
{
if iterations < MAX_ITERATIONS {
// Count the number of fixed errors.
for (rule, count) in applied {
*fixed.entry(rule).or_default() += count;
}
if let SourceKind::Jupyter(notebook) = source_kind {
notebook.update(&source_map, &fixed_contents);
}
// Store the fixed contents.
transformed = Cow::Owned(fixed_contents);

View file

@ -9,9 +9,11 @@ use log::Level;
use once_cell::sync::Lazy;
use rustpython_parser::{ParseError, ParseErrorType};
use ruff_python_ast::source_code::SourceCode;
use ruff_python_ast::source_code::{OneIndexed, SourceCode, SourceLocation};
use crate::fs;
use crate::jupyter::Notebook;
use crate::source_kind::SourceKind;
pub(crate) static WARNINGS: Lazy<Mutex<Vec<&'static str>>> = Lazy::new(Mutex::default);
@ -137,25 +139,70 @@ pub fn set_up_logging(level: &LogLevel) -> Result<()> {
pub struct DisplayParseError<'a> {
error: ParseError,
source_code: SourceCode<'a, 'a>,
source_kind: Option<&'a SourceKind>,
}
impl<'a> DisplayParseError<'a> {
pub fn new(error: ParseError, source_code: SourceCode<'a, 'a>) -> Self {
Self { error, source_code }
pub fn new(
error: ParseError,
source_code: SourceCode<'a, 'a>,
source_kind: Option<&'a SourceKind>,
) -> Self {
Self {
error,
source_code,
source_kind,
}
}
}
impl Display for DisplayParseError<'_> {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
write!(
f,
"{header} {path}{colon}",
header = "Failed to parse".bold(),
path = fs::relativize_path(Path::new(&self.error.source_path)).bold(),
colon = ":".cyan(),
)?;
let source_location = self.source_code.source_location(self.error.offset);
// If we're working on a Jupyter notebook, translate the positions
// with respect to the cell and row in the cell. This is the same
// format as the `TextEmitter`.
let error_location = if let Some(jupyter_index) = self
.source_kind
.and_then(SourceKind::notebook)
.map(Notebook::index)
{
write!(
f,
"cell {cell}{colon}",
cell = jupyter_index
.cell(source_location.row.get())
.unwrap_or_default(),
colon = ":".cyan(),
)?;
SourceLocation {
row: OneIndexed::new(
jupyter_index
.cell_row(source_location.row.get())
.unwrap_or(1) as usize,
)
.unwrap(),
column: source_location.column,
}
} else {
source_location
};
write!(
f,
"{header} {path}{colon}{row}{colon}{column}{colon} {inner}",
header = "Failed to parse".bold(),
path = fs::relativize_path(Path::new(&self.error.source_path)).bold(),
row = source_location.row,
column = source_location.column,
"{row}{colon}{column}{colon} {inner}",
row = error_location.row,
column = error_location.column,
colon = ":".cyan(),
inner = &DisplayParseErrorType(&self.error.error)
)

View file

@ -7,12 +7,13 @@ use colored::Colorize;
use ruff_python_ast::source_code::OneIndexed;
use crate::fs::relativize_path;
use crate::jupyter::JupyterIndex;
use crate::jupyter::{JupyterIndex, Notebook};
use crate::message::diff::calculate_print_width;
use crate::message::text::{MessageCodeFrame, RuleCodeAndBody};
use crate::message::{
group_messages_by_filename, Emitter, EmitterContext, Message, MessageWithLocation,
};
use crate::source_kind::SourceKind;
#[derive(Default)]
pub struct GroupedEmitter {
@ -65,7 +66,10 @@ impl Emitter for GroupedEmitter {
writer,
"{}",
DisplayGroupedMessage {
jupyter_index: context.jupyter_index(message.filename()),
jupyter_index: context
.source_kind(message.filename())
.and_then(SourceKind::notebook)
.map(Notebook::index),
message,
show_fix_status: self.show_fix_status,
show_source: self.show_source,
@ -114,11 +118,15 @@ impl Display for DisplayGroupedMessage<'_> {
write!(
f,
"cell {cell}{sep}",
cell = jupyter_index.row_to_cell[start_location.row.get()],
cell = jupyter_index
.cell(start_location.row.get())
.unwrap_or_default(),
sep = ":".cyan()
)?;
(
jupyter_index.row_to_row_in_cell[start_location.row.get()] as usize,
jupyter_index
.cell_row(start_location.row.get())
.unwrap_or(1) as usize,
start_location.column.get(),
)
} else {

View file

@ -6,6 +6,7 @@ use std::ops::Deref;
use ruff_text_size::{TextRange, TextSize};
use rustc_hash::FxHashMap;
use crate::source_kind::SourceKind;
pub use azure::AzureEmitter;
pub use github::GithubEmitter;
pub use gitlab::GitlabEmitter;
@ -17,8 +18,6 @@ use ruff_diagnostics::{Diagnostic, DiagnosticKind, Fix};
use ruff_python_ast::source_code::{SourceFile, SourceLocation};
pub use text::TextEmitter;
use crate::jupyter::JupyterIndex;
mod azure;
mod diff;
mod github;
@ -127,22 +126,23 @@ pub trait Emitter {
/// Context passed to [`Emitter`].
pub struct EmitterContext<'a> {
jupyter_indices: &'a FxHashMap<String, JupyterIndex>,
source_kind: &'a FxHashMap<String, SourceKind>,
}
impl<'a> EmitterContext<'a> {
pub fn new(jupyter_indices: &'a FxHashMap<String, JupyterIndex>) -> Self {
Self { jupyter_indices }
pub fn new(source_kind: &'a FxHashMap<String, SourceKind>) -> Self {
Self { source_kind }
}
/// Tests if the file with `name` is a jupyter notebook.
pub fn is_jupyter_notebook(&self, name: &str) -> bool {
self.jupyter_indices.contains_key(name)
self.source_kind
.get(name)
.map_or(false, SourceKind::is_jupyter)
}
/// Returns the file's [`JupyterIndex`] if the file `name` is a jupyter notebook.
pub fn jupyter_index(&self, name: &str) -> Option<&JupyterIndex> {
self.jupyter_indices.get(name)
pub fn source_kind(&self, name: &str) -> Option<&SourceKind> {
self.source_kind.get(name)
}
}
@ -226,8 +226,8 @@ def fibonacci(n):
emitter: &mut dyn Emitter,
messages: &[Message],
) -> String {
let indices = FxHashMap::default();
let context = EmitterContext::new(&indices);
let source_kinds = FxHashMap::default();
let context = EmitterContext::new(&source_kinds);
let mut output: Vec<u8> = Vec::new();
emitter.emit(&mut output, messages, &context).unwrap();

View file

@ -11,10 +11,12 @@ use ruff_text_size::{TextRange, TextSize};
use ruff_python_ast::source_code::{OneIndexed, SourceLocation};
use crate::fs::relativize_path;
use crate::jupyter::Notebook;
use crate::line_width::{LineWidth, TabSize};
use crate::message::diff::Diff;
use crate::message::{Emitter, EmitterContext, Message};
use crate::registry::AsRule;
use crate::source_kind::SourceKind;
bitflags! {
#[derive(Default)]
@ -72,25 +74,32 @@ impl Emitter for TextEmitter {
let start_location = message.compute_start_location();
// Check if we're working on a jupyter notebook and translate positions with cell accordingly
let diagnostic_location =
if let Some(jupyter_index) = context.jupyter_index(message.filename()) {
write!(
writer,
"cell {cell}{sep}",
cell = jupyter_index.row_to_cell[start_location.row.get()],
sep = ":".cyan(),
)?;
let diagnostic_location = if let Some(jupyter_index) = context
.source_kind(message.filename())
.and_then(SourceKind::notebook)
.map(Notebook::index)
{
write!(
writer,
"cell {cell}{sep}",
cell = jupyter_index
.cell(start_location.row.get())
.unwrap_or_default(),
sep = ":".cyan(),
)?;
SourceLocation {
row: OneIndexed::new(
jupyter_index.row_to_row_in_cell[start_location.row.get()] as usize,
)
.unwrap(),
column: start_location.column,
}
} else {
start_location
};
SourceLocation {
row: OneIndexed::new(
jupyter_index
.cell_row(start_location.row.get())
.unwrap_or(1) as usize,
)
.unwrap(),
column: start_location.column,
}
} else {
start_location
};
writeln!(
writer,

View file

@ -5,7 +5,9 @@ use ruff_python_ast::source_code::Locator;
use ruff_python_ast::statement_visitor::StatementVisitor;
use crate::directives::IsortDirectives;
use crate::jupyter::Notebook;
use crate::rules::isort::helpers;
use crate::source_kind::SourceKind;
/// A block of imports within a Python module.
#[derive(Debug, Default)]
@ -29,6 +31,7 @@ pub(crate) struct BlockBuilder<'a> {
is_stub: bool,
blocks: Vec<Block<'a>>,
splits: &'a [TextSize],
cell_offsets: Option<&'a [TextSize]>,
exclusions: &'a [TextRange],
nested: bool,
}
@ -38,6 +41,7 @@ impl<'a> BlockBuilder<'a> {
locator: &'a Locator<'a>,
directives: &'a IsortDirectives,
is_stub: bool,
source_kind: Option<&'a SourceKind>,
) -> Self {
Self {
locator,
@ -46,6 +50,9 @@ impl<'a> BlockBuilder<'a> {
splits: &directives.splits,
exclusions: &directives.exclusions,
nested: false,
cell_offsets: source_kind
.and_then(SourceKind::notebook)
.map(Notebook::cell_offsets),
}
}
@ -129,6 +136,22 @@ where
}
}
// Track Jupyter notebook cell offsets as splits. This will make sure
// that each cell is considered as an individual block to organize the
// imports in. Thus, not creating an edit which spans across multiple
// cells.
if let Some(cell_offsets) = self.cell_offsets {
for (index, split) in cell_offsets.iter().enumerate() {
if stmt.start() >= *split {
// We don't want any extra newlines between cells.
self.finalize(None);
self.cell_offsets = Some(&cell_offsets[index + 1..]);
} else {
break;
}
}
}
// Test if the statement is in an excluded range
let mut is_excluded = false;
for (index, exclusion) in self.exclusions.iter().enumerate() {

View file

@ -344,6 +344,7 @@ mod tests {
&directives,
&settings,
flags::Noqa::Enabled,
None,
);
diagnostics.sort_by_key(Diagnostic::start);
let actual = diagnostics

View file

@ -0,0 +1,26 @@
use crate::jupyter::Notebook;
#[derive(Debug, PartialEq, is_macro::Is)]
pub enum SourceKind {
Python(String),
Jupyter(Notebook),
}
impl SourceKind {
/// Return the source content.
pub fn content(&self) -> &str {
match self {
SourceKind::Python(content) => content,
SourceKind::Jupyter(notebook) => notebook.content(),
}
}
/// Return the [`Notebook`] if the source kind is [`SourceKind::Jupyter`].
pub fn notebook(&self) -> Option<&Notebook> {
if let Self::Jupyter(notebook) = self {
Some(notebook)
} else {
None
}
}
}

View file

@ -13,7 +13,7 @@ use rustpython_parser::lexer::LexResult;
use ruff_diagnostics::{AutofixKind, Diagnostic};
use ruff_python_ast::source_code::{Indexer, Locator, SourceFileBuilder, Stylist};
use crate::autofix::fix_file;
use crate::autofix::{fix_file, FixResult};
use crate::directives;
use crate::linter::{check_path, LinterResult};
use crate::message::{Emitter, EmitterContext, Message, TextEmitter};
@ -81,6 +81,7 @@ fn test_contents(contents: &str, path: &Path, settings: &Settings) -> Vec<Messag
&directives,
settings,
flags::Noqa::Enabled,
None,
);
let source_has_errors = error.is_some();
@ -95,7 +96,11 @@ fn test_contents(contents: &str, path: &Path, settings: &Settings) -> Vec<Messag
let mut diagnostics = diagnostics.clone();
let mut contents = contents.to_string();
while let Some((fixed_contents, _)) = fix_file(&diagnostics, &Locator::new(&contents)) {
while let Some(FixResult {
code: fixed_contents,
..
}) = fix_file(&diagnostics, &Locator::new(&contents))
{
if iterations < max_iterations() {
iterations += 1;
} else {
@ -133,6 +138,7 @@ fn test_contents(contents: &str, path: &Path, settings: &Settings) -> Vec<Messag
&directives,
settings,
flags::Noqa::Enabled,
None,
);
if let Some(fixed_error) = fixed_error {

View file

@ -14,12 +14,13 @@ use rustc_hash::FxHashMap;
use similar::TextDiff;
use ruff::fs;
use ruff::jupyter::{is_jupyter_notebook, JupyterIndex, JupyterNotebook};
use ruff::jupyter::{is_jupyter_notebook, Notebook};
use ruff::linter::{lint_fix, lint_only, FixTable, FixerResult, LinterResult};
use ruff::logging::DisplayParseError;
use ruff::message::Message;
use ruff::pyproject_toml::lint_pyproject_toml;
use ruff::settings::{flags, AllSettings, Settings};
use ruff::source_kind::SourceKind;
use ruff_python_ast::imports::ImportMap;
use ruff_python_ast::source_code::{LineIndex, SourceCode, SourceFileBuilder};
use ruff_python_stdlib::path::is_project_toml;
@ -31,9 +32,7 @@ pub(crate) struct Diagnostics {
pub(crate) messages: Vec<Message>,
pub(crate) fixed: FxHashMap<String, FixTable>,
pub(crate) imports: ImportMap,
/// Jupyter notebook indexing table for each input file that is a jupyter notebook
/// so we can rewrite the diagnostics in the end
pub(crate) jupyter_index: FxHashMap<String, JupyterIndex>,
pub(crate) source_kind: FxHashMap<String, SourceKind>,
}
impl Diagnostics {
@ -42,7 +41,7 @@ impl Diagnostics {
messages,
fixed: FxHashMap::default(),
imports,
jupyter_index: FxHashMap::default(),
source_kind: FxHashMap::default(),
}
}
}
@ -62,20 +61,15 @@ impl AddAssign for Diagnostics {
}
}
}
self.jupyter_index.extend(other.jupyter_index);
self.source_kind.extend(other.source_kind);
}
}
/// Returns either an indexed python jupyter notebook or a diagnostic (which is empty if we skip)
fn load_jupyter_notebook(path: &Path) -> Result<(String, JupyterIndex), Box<Diagnostics>> {
let notebook = match JupyterNotebook::read(path) {
fn load_jupyter_notebook(path: &Path) -> Result<Notebook, Box<Diagnostics>> {
let notebook = match Notebook::read(path) {
Ok(notebook) => {
if !notebook
.metadata
.language_info
.as_ref()
.map_or(true, |language| language.name == "python")
{
if !notebook.is_python_notebook() {
// Not a python notebook, this could e.g. be an R notebook which we want to just skip
debug!(
"Skipping {} because it's not a Python notebook",
@ -98,7 +92,7 @@ fn load_jupyter_notebook(path: &Path) -> Result<(String, JupyterIndex), Box<Diag
}
};
Ok(notebook.index())
Ok(notebook)
}
/// Lint the source code at the given `Path`.
@ -144,15 +138,17 @@ pub(crate) fn lint_path(
}
// Read the file from disk
let (contents, jupyter_index) = if is_jupyter_notebook(path) {
let mut source_kind = if is_jupyter_notebook(path) {
match load_jupyter_notebook(path) {
Ok((contents, jupyter_index)) => (contents, Some(jupyter_index)),
Err(diagnostics) => return Ok(*diagnostics),
Ok(notebook) => SourceKind::Jupyter(notebook),
Err(diagnostic) => return Ok(*diagnostic),
}
} else {
(std::fs::read_to_string(path)?, None)
SourceKind::Python(std::fs::read_to_string(path)?)
};
let contents = source_kind.content().to_string();
// Lint the file.
let (
LinterResult {
@ -165,11 +161,24 @@ pub(crate) fn lint_path(
result,
transformed,
fixed,
}) = lint_fix(&contents, path, package, noqa, &settings.lib)
{
}) = lint_fix(
&contents,
path,
package,
noqa,
&settings.lib,
&mut source_kind,
) {
if !fixed.is_empty() {
if matches!(autofix, flags::FixMode::Apply) {
write(path, transformed.as_bytes())?;
match &source_kind {
SourceKind::Python(_) => {
write(path, transformed.as_bytes())?;
}
SourceKind::Jupyter(notebook) => {
notebook.write(path)?;
}
}
} else if matches!(autofix, flags::FixMode::Diff) {
let mut stdout = io::stdout().lock();
TextDiff::from_lines(contents.as_str(), &transformed)
@ -200,7 +209,8 @@ pub(crate) fn lint_path(
"{}",
DisplayParseError::new(
err,
SourceCode::new(&contents, &LineIndex::from_source_text(&contents))
SourceCode::new(&contents, &LineIndex::from_source_text(&contents)),
Some(&source_kind),
)
);
@ -215,25 +225,16 @@ pub(crate) fn lint_path(
}
}
let jupyter_index = match jupyter_index {
None => FxHashMap::default(),
Some(jupyter_index) => {
let mut index = FxHashMap::default();
index.insert(
path.to_str()
.ok_or_else(|| anyhow!("Unable to parse filename: {:?}", path))?
.to_string(),
jupyter_index,
);
index
}
};
Ok(Diagnostics {
messages,
fixed: FxHashMap::from_iter([(fs::relativize_path(path), fixed)]),
imports,
jupyter_index,
source_kind: FxHashMap::from_iter([(
path.to_str()
.ok_or_else(|| anyhow!("Unable to parse filename: {:?}", path))?
.to_string(),
source_kind,
)]),
})
}
@ -247,6 +248,7 @@ pub(crate) fn lint_stdin(
noqa: flags::Noqa,
autofix: flags::FixMode,
) -> Result<Diagnostics> {
let mut source_kind = SourceKind::Python(contents.to_string());
// Lint the inputs.
let (
LinterResult {
@ -265,6 +267,7 @@ pub(crate) fn lint_stdin(
package,
noqa,
settings,
&mut source_kind,
) {
if matches!(autofix, flags::FixMode::Apply) {
// Write the contents to stdout, regardless of whether any errors were fixed.
@ -332,7 +335,7 @@ pub(crate) fn lint_stdin(
fixed,
)]),
imports,
jupyter_index: FxHashMap::default(),
source_kind: FxHashMap::default(),
})
}

View file

@ -178,7 +178,7 @@ impl Printer {
return Ok(());
}
let context = EmitterContext::new(&diagnostics.jupyter_index);
let context = EmitterContext::new(&diagnostics.source_kind);
match self.format {
SerializationFormat::Json => {
@ -356,7 +356,7 @@ impl Printer {
writeln!(stdout)?;
}
let context = EmitterContext::new(&diagnostics.jupyter_index);
let context = EmitterContext::new(&diagnostics.source_kind);
TextEmitter::default()
.with_show_fix_status(show_fix_status(self.autofix_level))
.with_show_source(self.flags.contains(Flags::SHOW_SOURCE))

View file

@ -204,6 +204,7 @@ pub fn check(contents: &str, options: JsValue) -> Result<JsValue, JsValue> {
&directives,
&settings,
flags::Noqa::Enabled,
None,
);
let source_code = locator.to_source_code();