Implement an iterator for universal newlines (#3454)

# Summary

We need to support CR line endings (as opposed to LF and CRLF line endings, which are already supported). They're rare, but they do appear in Python code, and we tend to panic on any file that uses them.

Our `Locator` abstraction now supports CR line endings. However, Rust's `str#lines` implementation does _not_.

This PR adds a `UniversalNewlineIterator` implementation that respects all of CR, LF, and CRLF line endings, and plugs it into most of the `.lines()` call sites.

As an alternative design, it could be nice if we could leverage `Locator` for this. We've already computed all of the line endings, so we could probably iterate much more efficiently?

# Test Plan

Largely relying on automated testing, however, also ran over some known failure cases, like #3404.
This commit is contained in:
Charlie Marsh 2023-03-13 00:01:29 -04:00 committed by GitHub
parent 2a4d6ab3b2
commit c2750a59ab
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
35 changed files with 325 additions and 126 deletions

View file

@ -40,19 +40,18 @@ pub fn raw_contents(contents: &str) -> &str {
/// Return the leading quote for a string or byte literal (e.g., `"""`).
pub fn leading_quote(content: &str) -> Option<&str> {
if let Some(first_line) = content.lines().next() {
for pattern in TRIPLE_QUOTE_STR_PREFIXES
.iter()
.chain(TRIPLE_QUOTE_BYTE_PREFIXES)
.chain(SINGLE_QUOTE_STR_PREFIXES)
.chain(SINGLE_QUOTE_BYTE_PREFIXES)
{
if first_line.starts_with(pattern) {
return Some(pattern);
TRIPLE_QUOTE_STR_PREFIXES
.iter()
.chain(TRIPLE_QUOTE_BYTE_PREFIXES)
.chain(SINGLE_QUOTE_STR_PREFIXES)
.chain(SINGLE_QUOTE_BYTE_PREFIXES)
.find_map(|pattern| {
if content.starts_with(pattern) {
Some(*pattern)
} else {
None
}
}
}
None
})
}
/// Return the trailing quote string for a string or byte literal (e.g., `"""`).