Introduce PythonWhitespace to confine trim operations to Python whitespace (#4994)

## Summary

We use `.trim()` and friends in a bunch of places, to strip whitespace
from source code. However, not all Unicode whitespace characters are
considered "whitespace" in Python, which only supports the standard
space, tab, and form-feed characters.

This PR audits our usages of `.trim()`, `.trim_start()`, `.trim_end()`,
and `char::is_whitespace`, and replaces them as appropriate with a new
`.trim_whitespace()` analogues, powered by a `PythonWhitespace` trait.

In general, the only place that should continue to use `.trim()` is
content within docstrings, which don't need to adhere to Python's
semantic definitions of whitespace.

Closes #4991.
This commit is contained in:
Charlie Marsh 2023-06-09 21:44:50 -04:00 committed by GitHub
parent c1ac50093c
commit f401050878
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
14 changed files with 64 additions and 32 deletions

View file

@ -13,3 +13,31 @@ pub fn leading_indentation(line: &str) -> &str {
line.find(|char: char| !is_python_whitespace(char))
.map_or(line, |index| &line[..index])
}
pub trait PythonWhitespace {
/// Like `str::trim()`, but only removes whitespace characters that Python considers
/// to be [whitespace](https://docs.python.org/3/reference/lexical_analysis.html#whitespace-between-tokens).
fn trim_whitespace(&self) -> &Self;
/// Like `str::trim_start()`, but only removes whitespace characters that Python considers
/// to be [whitespace](https://docs.python.org/3/reference/lexical_analysis.html#whitespace-between-tokens).
fn trim_whitespace_start(&self) -> &Self;
/// Like `str::trim_end()`, but only removes whitespace characters that Python considers
/// to be [whitespace](https://docs.python.org/3/reference/lexical_analysis.html#whitespace-between-tokens).
fn trim_whitespace_end(&self) -> &Self;
}
impl PythonWhitespace for str {
fn trim_whitespace(&self) -> &Self {
self.trim_matches(is_python_whitespace)
}
fn trim_whitespace_start(&self) -> &Self {
self.trim_start_matches(is_python_whitespace)
}
fn trim_whitespace_end(&self) -> &Self {
self.trim_end_matches(is_python_whitespace)
}
}