Introduce PythonWhitespace to confine trim operations to Python whitespace (#4994)

## Summary

We use `.trim()` and friends in a bunch of places, to strip whitespace
from source code. However, not all Unicode whitespace characters are
considered "whitespace" in Python, which only supports the standard
space, tab, and form-feed characters.

This PR audits our usages of `.trim()`, `.trim_start()`, `.trim_end()`,
and `char::is_whitespace`, and replaces them as appropriate with a new
`.trim_whitespace()` analogues, powered by a `PythonWhitespace` trait.

In general, the only place that should continue to use `.trim()` is
content within docstrings, which don't need to adhere to Python's
semantic definitions of whitespace.

Closes #4991.
This commit is contained in:
Charlie Marsh 2023-06-09 21:44:50 -04:00 committed by GitHub
parent c1ac50093c
commit f401050878
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
14 changed files with 64 additions and 32 deletions

View file

@ -4,7 +4,7 @@ use crate::trivia::{SimpleTokenizer, TokenKind};
use ruff_python_ast::node::AnyNodeRef;
use ruff_python_ast::source_code::Locator;
use ruff_python_ast::whitespace;
use ruff_python_whitespace::UniversalNewlines;
use ruff_python_whitespace::{PythonWhitespace, UniversalNewlines};
use ruff_text_size::{TextRange, TextSize};
use rustpython_parser::ast::Ranged;
use std::cmp::Ordering;
@ -986,7 +986,7 @@ fn max_empty_lines(contents: &str) -> usize {
let mut max_empty_lines = 0;
for line in contents.universal_newlines().skip(1) {
if line.trim().is_empty() {
if line.trim_whitespace().is_empty() {
empty_lines += 1;
} else {
max_empty_lines = max_empty_lines.max(empty_lines);