Avoid consuming newline for unterminated string (#12067)

## Summary

This PR fixes the lexer logic to **not** consume the newline character
for an unterminated string literal.

Currently, the lexer would consume it to be part of the string itself
but that would be bad for recovery because then the lexer wouldn't emit
the newline token ever. This PR fixes that to avoid consuming the
newline character in that case.

This was discovered during https://github.com/astral-sh/ruff/pull/12060.

## Test Plan

Update the snapshots and validate them.
This commit is contained in:
Dhruv Manilawala 2024-06-27 17:02:48 +05:30 committed by GitHub
parent 55f4812051
commit e137c824c3
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 22 additions and 8 deletions

View file

@ -962,25 +962,30 @@ impl<'src> Lexer<'src> {
// Skip up to the current character.
self.cursor.skip_bytes(index);
let ch = self.cursor.bump();
// Lookahead because we want to bump only if it's a quote or being escaped.
let quote_or_newline = self.cursor.first();
// If the character is escaped, continue scanning.
if num_backslashes % 2 == 1 {
if ch == Some('\r') {
self.cursor.bump();
if quote_or_newline == '\r' {
self.cursor.eat_char('\n');
}
continue;
}
match ch {
Some(newline @ ('\r' | '\n')) => {
match quote_or_newline {
'\r' | '\n' => {
return self.push_error(LexicalError::new(
LexicalErrorType::UnclosedStringError,
self.token_range().sub_end(newline.text_len()),
self.token_range(),
));
}
Some(ch) if ch == quote => {
break self.offset() - TextSize::new(1);
ch if ch == quote => {
let value_end = self.offset();
self.cursor.bump();
break value_end;
}
_ => unreachable!("memchr2 returned an index that is not a quote or a newline"),
}