Introduce PythonWhitespace to confine trim operations to Python whitespace (#4994)

## Summary

We use `.trim()` and friends in a bunch of places, to strip whitespace
from source code. However, not all Unicode whitespace characters are
considered "whitespace" in Python, which only supports the standard
space, tab, and form-feed characters.

This PR audits our usages of `.trim()`, `.trim_start()`, `.trim_end()`,
and `char::is_whitespace`, and replaces them as appropriate with a new
`.trim_whitespace()` analogues, powered by a `PythonWhitespace` trait.

In general, the only place that should continue to use `.trim()` is
content within docstrings, which don't need to adhere to Python's
semantic definitions of whitespace.

Closes #4991.
This commit is contained in:
Charlie Marsh 2023-06-09 21:44:50 -04:00 committed by GitHub
parent c1ac50093c
commit f401050878
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
14 changed files with 64 additions and 32 deletions

View file

@ -4,7 +4,7 @@ use std::path::Path;
use itertools::Itertools;
use log::error;
use num_traits::Zero;
use ruff_python_whitespace::UniversalNewlineIterator;
use ruff_python_whitespace::{PythonWhitespace, UniversalNewlineIterator};
use ruff_text_size::{TextRange, TextSize};
use rustc_hash::{FxHashMap, FxHashSet};
use rustpython_parser::ast::{
@ -1031,7 +1031,7 @@ pub fn trailing_lines_end(stmt: &Stmt, locator: &Locator) -> TextSize {
let rest = &locator.contents()[usize::from(line_end)..];
UniversalNewlineIterator::with_offset(rest, line_end)
.take_while(|line| line.trim().is_empty())
.take_while(|line| line.trim_whitespace().is_empty())
.last()
.map_or(line_end, |l| l.full_end())
}