Consider line continuation character for re-lexing (#12008)

## Summary This PR fixes a bug where the re-lexing logic didn't consider the line continuation character being present before the newline character. This meant that the lexer was being moved back to the newline character which is actually ignored via `\`. Considering the following code: ```py f'middle {'string':\ 'format spec'} ``` The old token stream is: ``` ... Colon 18..19 FStringMiddle 19..29 (flags = F_STRING) Newline 20..21 Indent 21..29 String 29..42 Rbrace 42..43 ... ``` Notice how the ranges are overlapping between the `FStringMiddle` token and the tokens emitted after moving the lexer backwards. After this fix, the new token stream which is without moving the lexer backwards in this scenario: ``` FStringStart 0..2 (flags = F_STRING) FStringMiddle 2..9 (flags = F_STRING) Lbrace 9..10 String 10..18 Colon 18..19 FStringMiddle 19..29 (flags = F_STRING) FStringEnd 29..30 (flags = F_STRING) Name 30..36 Name 37..41 Unknown 41..44 Newline 44..45 ``` fixes: #12004 ## Test Plan Add test cases and update the snapshots.
2025-09-22 18:12:41 +00:00 · 2024-06-25 07:43:54 +05:30 · 2024-06-25 07:43:54 +05:30 · 68a8978454
commit 68a8978454
parent cd2af3be73
5 changed files with 567 additions and 3 deletions
--- a/crates/ruff_python_parser/src/lexer.rs
+++ b/crates/ruff_python_parser/src/lexer.rs
@ -1373,15 +1373,33 @@ impl<'src> Lexer<'src> {
        }

        let mut current_position = self.current_range().start();
-        let reverse_chars = self.source[..current_position.to_usize()].chars().rev();
+        let mut reverse_chars = self.source[..current_position.to_usize()]
+            .chars()
+            .rev()
+            .peekable();
        let mut newline_position = None;

-        for ch in reverse_chars {
+        while let Some(ch) = reverse_chars.next() {
            if is_python_whitespace(ch) {
                current_position -= ch.text_len();
            } else if matches!(ch, '\n' | '\r') {
                current_position -= ch.text_len();
-                newline_position = Some(current_position);
+                // Count the number of backslashes before the newline character.
+                let mut backslash_count = 0;
+                while reverse_chars.next_if_eq(&'\\').is_some() {
+                    backslash_count += 1;
+                }
+                if backslash_count == 0 {
+                    // No escapes: `\n`
+                    newline_position = Some(current_position);
+                } else {
+                    if backslash_count % 2 == 0 {
+                        // Even number of backslashes i.e., all backslashes cancel each other out
+                        // which means the newline character is not being escaped.
+                        newline_position = Some(current_position);
+                    }
+                    current_position -= TextSize::new('\\'.text_len().to_u32() * backslash_count);
+                }
            } else {
                break;
            }