ruff/crates/ruff_python_formatter/tests/normalizer.rs
Andrew Gallant c48ba690eb
add support for formatting reStructuredText code snippets (#9003)
(This is not possible to actually use until
https://github.com/astral-sh/ruff/pull/8854 is merged.)

ruff_python_formatter: add reStructuredText docstring formatting support

This commit makes use of the refactoring done in prior commits to slot
in reStructuredText support. Essentially, we add a new type of code
example and look for *both* literal blocks and code block directives.
Literal blocks are treated as Python by default because it seems to be a
[common
practice](https://github.com/adamchainz/blacken-docs/issues/195).

That is, literal blocks like this:

```
def example():
    """
    Here's an example::

        foo( 1 )

    All done.
    """
    pass
```

Will get reformatted. And code blocks (via reStructuredText directives)
will also get reformatted:


```
def example():
    """
    Here's an example:

    .. code-block:: python

        foo( 1 )

    All done.
    """
    pass
```

When looking for a code block, it is possible for it to become invalid.
In which case, we back out of looking for a code example and print the
lines out as they are. As with doctest formatting, if reformatting the
code would result in invalid Python or if the code collected from the
block is invalid, then formatting is also skipped.

A number of tests have been added to check both the formatting and
resetting behavior. Mixed indentation is also tested a fair bit, since
one of my initial attempts at dealing with mixed indentation ended up
not working.

I recommend working through this PR commit-by-commit. There is in
particular a somewhat gnarly refactoring before reST support is added.

Closes #8859
2023-12-05 14:14:44 -05:00

112 lines
4.2 KiB
Rust

use {
itertools::Either::{Left, Right},
once_cell::sync::Lazy,
regex::Regex,
};
use ruff_python_ast::visitor::transformer;
use ruff_python_ast::visitor::transformer::Transformer;
use ruff_python_ast::{self as ast, Expr, Stmt};
/// A struct to normalize AST nodes for the purpose of comparing formatted representations for
/// semantic equivalence.
///
/// Vis-à-vis comparing ASTs, comparing these normalized representations does the following:
/// - Ignores non-abstraction information that we've encoded into the AST, e.g., the difference
/// between `class C: ...` and `class C(): ...`, which is part of our AST but not `CPython`'s.
/// - Normalize strings. The formatter can re-indent docstrings, so we need to compare string
/// contents ignoring whitespace. (Black does the same.)
/// - The formatter can also reformat code snippets when they're Python code, which can of
/// course change the string in arbitrary ways. Black itself does not reformat code snippets,
/// so we carve our own path here by stripping everything that looks like code snippets from
/// string literals.
/// - Ignores nested tuples in deletions. (Black does the same.)
pub(crate) struct Normalizer;
impl Normalizer {
/// Transform an AST module into a normalized representation.
#[allow(dead_code)]
pub(crate) fn visit_module(&self, module: &mut ast::Mod) {
match module {
ast::Mod::Module(module) => {
self.visit_body(&mut module.body);
}
ast::Mod::Expression(expression) => {
self.visit_expr(&mut expression.body);
}
}
}
}
impl Transformer for Normalizer {
fn visit_stmt(&self, stmt: &mut Stmt) {
if let Stmt::Delete(delete) = stmt {
// Treat `del a, b` and `del (a, b)` equivalently.
delete.targets = delete
.targets
.clone()
.into_iter()
.flat_map(|target| {
if let Expr::Tuple(tuple) = target {
Left(tuple.elts.into_iter())
} else {
Right(std::iter::once(target))
}
})
.collect();
}
transformer::walk_stmt(self, stmt);
}
fn visit_string_literal(&self, string_literal: &mut ast::StringLiteral) {
static STRIP_DOC_TESTS: Lazy<Regex> = Lazy::new(|| {
Regex::new(
r#"(?mx)
(
# strip doctest PS1 prompt lines
^\s*>>>\s.*(\n|$)
|
# strip doctest PS2 prompt lines
# Also handles the case of an empty ... line.
^\s*\.\.\.((\n|$)|\s.*(\n|$))
)+
"#,
)
.unwrap()
});
static STRIP_RST_BLOCKS: Lazy<Regex> = Lazy::new(|| {
// This is kind of unfortunate, but it's pretty tricky (likely
// impossible) to detect a reStructuredText block with a simple
// regex. So we just look for the start of a block and remove
// everything after it. Talk about a hammer.
Regex::new(r#"::(?s:.*)"#).unwrap()
});
// Start by (1) stripping everything that looks like a code
// snippet, since code snippets may be completely reformatted if
// they are Python code.
string_literal.value = STRIP_DOC_TESTS
.replace_all(
&string_literal.value,
"<DOCTEST-CODE-SNIPPET: Removed by normalizer>\n",
)
.into_owned();
string_literal.value = STRIP_RST_BLOCKS
.replace_all(
&string_literal.value,
"<RSTBLOCK-CODE-SNIPPET: Removed by normalizer>\n",
)
.into_owned();
// Normalize a string by (2) stripping any leading and trailing space from each
// line, and (3) removing any blank lines from the start and end of the string.
string_literal.value = string_literal
.value
.lines()
.map(str::trim)
.collect::<Vec<_>>()
.join("\n")
.trim()
.to_owned();
}
}