Consider VS Code cell metadata to determine valid code cells (#12864)

## Summary

This PR adds support for VS Code specific cell metadata to consider when
collecting valid code cells.

For context, Ruff only runs on valid code cells. These are the code
cells that doesn't contain cell magics. Previously, Ruff only used the
notebook's metadata to determine whether it's a Python notebook. But, in
VS Code, a notebook's preferred language might be Python but it could
still contain code cells for other languages. This can be determined
with the `metadata.vscode.languageId` field.

### References:
* https://code.visualstudio.com/docs/languages/identifiers
* e6c009a3d4/extensions/ipynb/src/serializers.ts (L104-L107)
*
e6c009a3d4/extensions/ipynb/src/serializers.ts (L117-L122)

This brings us one step closer to fixing #12281.

## Test Plan

Add test cases for `is_valid_python_code_cell` and an integration test
case which showcase running it end to end. The test notebook contains a
JavaScript code cell and a Python code cell.
This commit is contained in:
Dhruv Manilawala 2024-08-13 22:09:56 +05:30 committed by GitHub
parent 899a52390b
commit ff53db3d99
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
11 changed files with 226 additions and 21 deletions

View file

@ -18,7 +18,7 @@
//! a code cell or not without looking at the `cell_type` property, which
//! would require a custom serializer.
use std::collections::BTreeMap;
use std::collections::{BTreeMap, HashMap};
use serde::{Deserialize, Serialize};
use serde_json::Value;
@ -122,7 +122,7 @@ pub struct RawCell {
/// <https://youtrack.jetbrains.com/issue/PY-59438/Jupyter-notebooks-created-with-PyCharm-are-missing-the-id-field-in-cells-in-the-.ipynb-json>
pub id: Option<String>,
/// Cell-level metadata.
pub metadata: Value,
pub metadata: CellMetadata,
pub source: SourceValue,
}
@ -137,7 +137,7 @@ pub struct MarkdownCell {
/// <https://youtrack.jetbrains.com/issue/PY-59438/Jupyter-notebooks-created-with-PyCharm-are-missing-the-id-field-in-cells-in-the-.ipynb-json>
pub id: Option<String>,
/// Cell-level metadata.
pub metadata: Value,
pub metadata: CellMetadata,
pub source: SourceValue,
}
@ -153,12 +153,36 @@ pub struct CodeCell {
#[serde(skip_serializing_if = "Option::is_none")]
pub id: Option<String>,
/// Cell-level metadata.
pub metadata: Value,
pub metadata: CellMetadata,
/// Execution, display, or stream outputs.
pub outputs: Vec<Value>,
pub source: SourceValue,
}
/// Cell-level metadata.
#[skip_serializing_none]
#[derive(Clone, Debug, Default, Serialize, Deserialize, PartialEq)]
pub struct CellMetadata {
/// VS Code specific cell metadata.
///
/// This is [`Some`] only if the cell's preferred language is different from the notebook's
/// preferred language.
/// <https://github.com/microsoft/vscode/blob/e6c009a3d4ee60f352212b978934f52c4689fbd9/extensions/ipynb/src/serializers.ts#L117-L122>
pub vscode: Option<CodeCellMetadataVSCode>,
/// Catch-all for metadata that isn't required by Ruff.
#[serde(flatten)]
pub extra: HashMap<String, Value>,
}
/// VS Code specific cell metadata.
/// <https://github.com/microsoft/vscode/blob/e6c009a3d4ee60f352212b978934f52c4689fbd9/extensions/ipynb/src/serializers.ts#L104-L107>
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq)]
#[serde(rename_all = "camelCase")]
pub struct CodeCellMetadataVSCode {
/// <https://code.visualstudio.com/docs/languages/identifiers>
pub language_id: String,
}
/// Notebook root-level metadata.
#[skip_serializing_none]
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq, Default)]