Add general support for parenthesized comments on expressions (#6485)

## Summary

This PR adds support for parenthesized comments. A parenthesized comment
is a comment that appears within a parenthesis, but not within the range
of the expression enclosed by the parenthesis. For example, the comment
here is a parenthesized comment:

```python
if (
    # comment
    True
):
    ...
```

The parentheses enclose the `True`, but the range of `True` doesn’t
include the `# comment`.

There are at least two problems associated with parenthesized comments:
(1) associating the comment with the correct (i.e., enclosed) node; and
(2) formatting the comment correctly, once it has been associated with
the enclosed node.

The solution proposed here for (1) is to search for parentheses between
preceding and following node, and use open and close parentheses to
break ties, rather than always assigning to the preceding node.

For (2), we handle these special parenthesized comments in `FormatExpr`.
The biggest risk with this approach is that we forget some codepath that
force-disables parenthesization (by passing in `Parentheses::Never`).
I've audited all usages of that enum and added additional handling +
test coverage for such cases.

Closes https://github.com/astral-sh/ruff/issues/6390.

## Test Plan

`cargo test` with new cases.

Before:

| project      | similarity index |
|--------------|------------------|
| build        | 0.75623          |
| cpython      | 0.75472          |
| django       | 0.99804          |
| transformers | 0.99618          |
| typeshed     | 0.74233          |
| warehouse    | 0.99601          |
| zulip        | 0.99727          |

After:

| project      | similarity index |
|--------------|------------------|
| build        | 0.75623          |
| cpython      | 0.75472          |
| django       | 0.99804          |
| transformers | 0.99618          |
| typeshed     | 0.74237          |
| warehouse    | 0.99601          |
| zulip        | 0.99727          |
This commit is contained in:
Charlie Marsh 2023-08-15 14:59:18 -04:00 committed by GitHub
parent 29c0b9f91c
commit a3d4f08f29
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
30 changed files with 806 additions and 236 deletions

View file

@ -414,6 +414,24 @@ impl<'a> Comments<'a> {
.leading_dangling_trailing(&NodeRefEqualityKey::from_ref(node.into()))
}
/// Returns any comments on the open parenthesis of a `node`.
///
/// For example, `# comment` in:
/// ```python
/// ( # comment
/// foo.bar
/// )
/// ```
#[inline]
pub(crate) fn open_parenthesis_comment<T>(&self, node: T) -> Option<&SourceComment>
where
T: Into<AnyNodeRef<'a>>,
{
self.leading_comments(node)
.first()
.filter(|comment| comment.line_position.is_end_of_line())
}
#[inline(always)]
#[cfg(not(debug_assertions))]
pub(crate) fn assert_formatted_all_comments(&self, _source_code: SourceCode) {}

View file

@ -18,16 +18,149 @@ pub(super) fn place_comment<'a>(
comment: DecoratedComment<'a>,
locator: &Locator,
) -> CommentPlacement<'a> {
// Handle comments before and after bodies such as the different branches of an if statement.
let comment = if comment.line_position().is_own_line() {
handle_own_line_comment_around_body(comment, locator)
} else {
handle_end_of_line_comment_around_body(comment, locator)
handle_parenthesized_comment(comment, locator)
.or_else(|comment| handle_end_of_line_comment_around_body(comment, locator))
.or_else(|comment| handle_own_line_comment_around_body(comment, locator))
.or_else(|comment| handle_enclosed_comment(comment, locator))
}
/// Handle parenthesized comments. A parenthesized comment is a comment that appears within a
/// parenthesis, but not within the range of the expression enclosed by the parenthesis.
/// For example, the comment here is a parenthesized comment:
/// ```python
/// if (
/// # comment
/// True
/// ):
/// ...
/// ```
/// The parentheses enclose `True`, but the range of `True`doesn't include the `# comment`.
///
/// Default handling can get parenthesized comments wrong in a number of ways. For example, the
/// comment here is marked (by default) as a trailing comment of `x`, when it should be a leading
/// comment of `y`:
/// ```python
/// assert (
/// x
/// ), ( # comment
/// y
/// )
/// ```
///
/// Similarly, this is marked as a leading comment of `y`, when it should be a trailing comment of
/// `x`:
/// ```python
/// if (
/// x
/// # comment
/// ):
/// y
/// ```
///
/// As a generalized solution, if a comment has a preceding node and a following node, we search for
/// opening and closing parentheses between the two nodes. If we find a closing parenthesis between
/// the preceding node and the comment, then the comment is a trailing comment of the preceding
/// node. If we find an opening parenthesis between the comment and the following node, then the
/// comment is a leading comment of the following node.
fn handle_parenthesized_comment<'a>(
comment: DecoratedComment<'a>,
locator: &Locator,
) -> CommentPlacement<'a> {
let Some(preceding) = comment.preceding_node() else {
return CommentPlacement::Default(comment);
};
// Change comment placement depending on the node type. These can be seen as node-specific
// fixups.
comment.or_else(|comment| match comment.enclosing_node() {
let Some(following) = comment.following_node() else {
return CommentPlacement::Default(comment);
};
// TODO(charlie): Assert that there are no bogus tokens in these ranges. There are a few cases
// where we _can_ hit bogus tokens, but the parentheses need to come before them. For example:
// ```python
// try:
// some_call()
// except (
// UnformattedError
// # trailing comment
// ) as err:
// handle_exception()
// ```
// Here, we lex from the end of `UnformattedError` to the start of `handle_exception()`, which
// means we hit an "other" token at `err`. We know the parentheses must precede the `err`, but
// this could be fixed by including `as err` in the node range.
//
// Another example:
// ```python
// @deco
// # comment
// def decorated():
// pass
// ```
// Here, we lex from the end of `deco` to the start of the arguments of `decorated`. We hit an
// "other" token at `decorated`, but any parentheses must precede that.
//
// For now, we _can_ assert, but to do so, we stop lexing when we hit a token that precedes an
// identifier.
if comment.line_position().is_end_of_line() {
let tokenizer = SimpleTokenizer::new(
locator.contents(),
TextRange::new(preceding.end(), comment.start()),
);
if tokenizer
.skip_trivia()
.take_while(|token| {
!matches!(
token.kind,
SimpleTokenKind::As | SimpleTokenKind::Def | SimpleTokenKind::Class
)
})
.any(|token| {
debug_assert!(
!matches!(token.kind, SimpleTokenKind::Bogus),
"Unexpected token between nodes: `{:?}`",
locator.slice(TextRange::new(preceding.end(), comment.start()),)
);
token.kind() == SimpleTokenKind::LParen
})
{
return CommentPlacement::leading(following, comment);
}
} else {
let tokenizer = SimpleTokenizer::new(
locator.contents(),
TextRange::new(comment.end(), following.start()),
);
if tokenizer
.skip_trivia()
.take_while(|token| {
!matches!(
token.kind,
SimpleTokenKind::As | SimpleTokenKind::Def | SimpleTokenKind::Class
)
})
.any(|token| {
debug_assert!(
!matches!(token.kind, SimpleTokenKind::Bogus),
"Unexpected token between nodes: `{:?}`",
locator.slice(TextRange::new(comment.end(), following.start()))
);
token.kind() == SimpleTokenKind::RParen
})
{
return CommentPlacement::trailing(preceding, comment);
}
}
CommentPlacement::Default(comment)
}
/// Handle a comment that is enclosed by a node.
fn handle_enclosed_comment<'a>(
comment: DecoratedComment<'a>,
locator: &Locator,
) -> CommentPlacement<'a> {
match comment.enclosing_node() {
AnyNodeRef::Parameters(arguments) => {
handle_parameters_separator_comment(comment, arguments, locator)
.or_else(|comment| handle_bracketed_end_of_line_comment(comment, locator))
@ -65,10 +198,7 @@ pub(super) fn place_comment<'a>(
handle_module_level_own_line_comment_before_class_or_function_comment(comment, locator)
}
AnyNodeRef::WithItem(_) => handle_with_item_comment(comment, locator),
AnyNodeRef::StmtFunctionDef(function_def) => {
handle_leading_function_with_decorators_comment(comment)
.or_else(|comment| handle_leading_returns_comment(comment, function_def))
}
AnyNodeRef::StmtFunctionDef(_) => handle_leading_function_with_decorators_comment(comment),
AnyNodeRef::StmtClassDef(class_def) => {
handle_leading_class_with_decorators_comment(comment, class_def)
}
@ -90,13 +220,17 @@ pub(super) fn place_comment<'a>(
| AnyNodeRef::ExprDictComp(_)
| AnyNodeRef::ExprTuple(_) => handle_bracketed_end_of_line_comment(comment, locator),
_ => CommentPlacement::Default(comment),
})
}
}
fn handle_end_of_line_comment_around_body<'a>(
comment: DecoratedComment<'a>,
locator: &Locator,
) -> CommentPlacement<'a> {
if comment.line_position().is_own_line() {
return CommentPlacement::Default(comment);
}
// Handle comments before the first statement in a body
// ```python
// for x in range(10): # in the main body ...
@ -245,7 +379,9 @@ fn handle_own_line_comment_around_body<'a>(
comment: DecoratedComment<'a>,
locator: &Locator,
) -> CommentPlacement<'a> {
debug_assert!(comment.line_position().is_own_line());
if comment.line_position().is_end_of_line() {
return CommentPlacement::Default(comment);
}
// If the following is the first child in an alternative body, this must be the last child in
// the previous one
@ -274,18 +410,11 @@ fn handle_own_line_comment_around_body<'a>(
}
// Check if we're between bodies and should attach to the following body.
handle_own_line_comment_between_branches(comment, preceding, locator)
.or_else(|comment| {
// Otherwise, there's no following branch or the indentation is too deep, so attach to the
// recursively last statement in the preceding body with the matching indentation.
handle_own_line_comment_after_branch(comment, preceding, locator)
})
.or_else(|comment| {
// If the following node is the first in its body, and there's a non-trivia token between the
// comment and the following node (like a parenthesis), then it means the comment is trailing
// the preceding node, not leading the following one.
handle_own_line_comment_in_clause(comment, preceding, locator)
})
handle_own_line_comment_between_branches(comment, preceding, locator).or_else(|comment| {
// Otherwise, there's no following branch or the indentation is too deep, so attach to the
// recursively last statement in the preceding body with the matching indentation.
handle_own_line_comment_after_branch(comment, preceding, locator)
})
}
/// Handles own line comments between two branches of a node.
@ -385,36 +514,6 @@ fn handle_own_line_comment_between_branches<'a>(
}
}
/// Handles own-line comments at the end of a clause, immediately preceding a body:
/// ```python
/// if (
/// True
/// # This should be a trailing comment of `True` and not a leading comment of `pass`
/// ):
/// pass
/// ```
fn handle_own_line_comment_in_clause<'a>(
comment: DecoratedComment<'a>,
preceding: AnyNodeRef<'a>,
locator: &Locator,
) -> CommentPlacement<'a> {
if let Some(following) = comment.following_node() {
if is_first_statement_in_body(following, comment.enclosing_node())
&& SimpleTokenizer::new(
locator.contents(),
TextRange::new(comment.end(), following.start()),
)
.skip_trivia()
.next()
.is_some()
{
return CommentPlacement::trailing(preceding, comment);
}
}
CommentPlacement::Default(comment)
}
/// Determine where to attach an own line comment after a branch depending on its indentation
fn handle_own_line_comment_after_branch<'a>(
comment: DecoratedComment<'a>,
@ -787,40 +886,6 @@ fn handle_leading_function_with_decorators_comment(comment: DecoratedComment) ->
}
}
/// Handles end-of-line comments between function parameters and the return type annotation,
/// attaching them as dangling comments to the function instead of making them trailing
/// parameter comments.
///
/// ```python
/// def double(a: int) -> ( # Hello
/// int
/// ):
/// return 2*a
/// ```
fn handle_leading_returns_comment<'a>(
comment: DecoratedComment<'a>,
function_def: &'a ast::StmtFunctionDef,
) -> CommentPlacement<'a> {
let parameters = function_def.parameters.as_ref();
let Some(returns) = function_def.returns.as_deref() else {
return CommentPlacement::Default(comment);
};
let is_preceding_parameters = comment
.preceding_node()
.is_some_and(|node| node == parameters.into());
let is_following_returns = comment
.following_node()
.is_some_and(|node| node == returns.into());
if comment.line_position().is_end_of_line() && is_preceding_parameters && is_following_returns {
CommentPlacement::dangling(comment.enclosing_node(), comment)
} else {
CommentPlacement::Default(comment)
}
}
/// Handle comments between decorators and the decorated node.
///
/// For example, given:
@ -1043,14 +1108,6 @@ fn handle_trailing_expression_starred_star_end_of_line_comment<'a>(
comment: DecoratedComment<'a>,
starred: &'a ast::ExprStarred,
) -> CommentPlacement<'a> {
if comment.line_position().is_own_line() {
return CommentPlacement::Default(comment);
}
if comment.following_node().is_none() {
return CommentPlacement::Default(comment);
}
CommentPlacement::leading(starred, comment)
}

View file

@ -4,19 +4,19 @@ expression: comments.debug(test_case.source_code)
---
{
Node {
kind: ExprName,
range: 1..2,
source: `a`,
kind: ExprBinOp,
range: 30..57,
source: `10 + # More comments⏎`,
}: {
"leading": [],
"dangling": [],
"trailing": [
"leading": [
SourceComment {
text: "# Trailing comment",
position: EndOfLine,
formatted: false,
},
],
"dangling": [],
"trailing": [],
},
Node {
kind: ExprConstant,

View file

@ -2,7 +2,7 @@ use ruff_formatter::{format_args, write, Buffer, FormatResult, FormatRuleWithOpt
use ruff_python_ast::node::AnyNodeRef;
use ruff_python_ast::ExprGeneratorExp;
use crate::comments::{leading_comments, SourceComment};
use crate::comments::SourceComment;
use crate::context::PyFormatContext;
use crate::expression::parentheses::{parenthesized, NeedsParentheses, OptionalParentheses};
use crate::prelude::*;
@ -14,10 +14,11 @@ pub enum GeneratorExpParentheses {
#[default]
Default,
// skip parens if the generator exp is the only argument to a function, e.g.
// ```python
// all(x for y in z)`
// ```
/// Skip parens if the generator is the only argument to a function and doesn't contain any
/// dangling comments. For example:
/// ```python
/// all(x for y in z)`
/// ```
StripIfOnlyFunctionArg,
}
@ -52,15 +53,12 @@ impl FormatNodeRule<ExprGeneratorExp> for FormatExprGeneratorExp {
let comments = f.context().comments().clone();
let dangling = comments.dangling_comments(item);
if self.parentheses == GeneratorExpParentheses::StripIfOnlyFunctionArg {
if self.parentheses == GeneratorExpParentheses::StripIfOnlyFunctionArg
&& dangling.is_empty()
{
write!(
f,
[
leading_comments(dangling),
group(&elt.format()),
soft_line_break_or_space(),
&joined
]
[group(&elt.format()), soft_line_break_or_space(), &joined]
)
} else {
write!(

View file

@ -107,7 +107,15 @@ impl FormatRule<Expr, PyFormatContext<'_>> for FormatExpr {
};
if parenthesize {
parenthesized("(", &format_expr, ")").fmt(f)
let comments = f.context().comments().clone();
let open_parenthesis_comment = comments.open_parenthesis_comment(expression);
parenthesized("(", &format_expr, ")")
.with_dangling_comments(
open_parenthesis_comment
.map(std::slice::from_ref)
.unwrap_or_default(),
)
.fmt(f)
} else {
let level = match f.context().node_level() {
NodeLevel::TopLevel | NodeLevel::CompoundStatement => NodeLevel::Expression(None),
@ -162,6 +170,9 @@ impl Format<PyFormatContext<'_>> for MaybeParenthesizeExpression<'_> {
let has_comments = comments.has_leading_comments(*expression)
|| comments.has_trailing_own_line_comments(*expression);
// If the expression has comments, we always want to preserve the parentheses. This also
// ensures that we correctly handle parenthesized comments, and don't need to worry about
// them in the implementation below.
if preserve_parentheses || has_comments {
return expression.format().with_options(Parentheses::Always).fmt(f);
}

View file

@ -45,6 +45,9 @@ impl FormatNodeRule<Arguments> for FormatArguments {
if is_single_argument_parenthesized(arg, item.end(), source) {
Parentheses::Always
} else {
// Note: no need to handle opening-parenthesis comments, since
// an opening-parenthesis comment implies that the argument is
// parenthesized.
Parentheses::Never
};
joiner.entry(other, &other.format().with_options(parentheses))

View file

@ -44,6 +44,7 @@ impl FormatNodeRule<StmtAssign> for FormatStmtAssign {
}
}
#[derive(Debug)]
struct FormatTargets<'a> {
targets: &'a [Expr],
}
@ -51,9 +52,17 @@ struct FormatTargets<'a> {
impl Format<PyFormatContext<'_>> for FormatTargets<'_> {
fn fmt(&self, f: &mut PyFormatter) -> FormatResult<()> {
if let Some((first, rest)) = self.targets.split_first() {
let can_omit_parentheses = has_own_parentheses(first, f.context()).is_some();
let comments = f.context().comments();
let group_id = if can_omit_parentheses {
let parenthesize = if comments.has_leading_comments(first) {
ParenthesizeTarget::Always
} else if has_own_parentheses(first, f.context()).is_some() {
ParenthesizeTarget::Never
} else {
ParenthesizeTarget::IfBreaks
};
let group_id = if parenthesize == ParenthesizeTarget::Never {
Some(f.group_id("assignment_parentheses"))
} else {
None
@ -61,17 +70,23 @@ impl Format<PyFormatContext<'_>> for FormatTargets<'_> {
let format_first = format_with(|f: &mut PyFormatter| {
let mut f = WithNodeLevel::new(NodeLevel::Expression(group_id), f);
if can_omit_parentheses {
write!(f, [first.format().with_options(Parentheses::Never)])
} else {
write!(
f,
[
if_group_breaks(&text("(")),
soft_block_indent(&first.format().with_options(Parentheses::Never)),
if_group_breaks(&text(")"))
]
)
match parenthesize {
ParenthesizeTarget::Always => {
write!(f, [first.format().with_options(Parentheses::Always)])
}
ParenthesizeTarget::Never => {
write!(f, [first.format().with_options(Parentheses::Never)])
}
ParenthesizeTarget::IfBreaks => {
write!(
f,
[
if_group_breaks(&text("(")),
soft_block_indent(&first.format().with_options(Parentheses::Never)),
if_group_breaks(&text(")"))
]
)
}
}
});
@ -91,3 +106,10 @@ impl Format<PyFormatContext<'_>> for FormatTargets<'_> {
}
}
}
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
enum ParenthesizeTarget {
Always,
Never,
IfBreaks,
}

View file

@ -66,10 +66,12 @@ impl FormatNodeRule<StmtFunctionDef> for FormatStmtFunctionDef {
write!(f, [space(), text("->"), space()])?;
if return_annotation.is_tuple_expr() {
write!(
f,
[return_annotation.format().with_options(Parentheses::Never)]
)?;
let parentheses = if comments.has_leading_comments(return_annotation.as_ref()) {
Parentheses::Always
} else {
Parentheses::Never
};
write!(f, [return_annotation.format().with_options(parentheses)])?;
} else if comments.has_trailing_comments(return_annotation.as_ref()) {
// Intentionally parenthesize any return annotations with trailing comments.
// This avoids an instability in cases like: