- Cache line object to avoid creating a Unicode object
for all of the tokens in the same line.
- Speed up byte offset to column offset conversion by using the
smallest buffer possible to measure the difference.
(cherry picked from commit d87b015106)
Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
gh-113602: Bail out when the parser tries to override existing errors (GH-113607)
(cherry picked from commit 9ed36d533a)
Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
(cherry picked from commit 48c49739f5)
Co-authored-by: Yilei Yang <yileiyang@google.com>
Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
gh-112388: Fix an error that was causing the parser to try to overwrite tokenizer errors (GH-112410)
(cherry picked from commit 2c8b191742)
Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
gh-111380: Show SyntaxWarnings only once when parsing if invalid syntax is encouintered (GH-111381)
(cherry picked from commit 3d2f1f0b83)
Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
[3.12] gh-110938: Fix error messages for indented blocks with functions and classes with generic type parameters (GH-110973)
(cherry picked from commit 24e4ec7766)
Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
gh-110259: Fix f-strings with multiline expressions and format specs (GH-110271)
(cherry picked from commit cc389ef627)
Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
gh-88943: Improve syntax error for non-ASCII character that follows a numerical literal (GH-109081)
It now points on the invalid non-ASCII character, not on the valid numerical literal.
(cherry picked from commit b2729e93e9)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
GH-107263: Increase C stack limit for most functions, except `_PyEval_EvalFrameDefault()` (GH-107535)
* Set C recursion limit to 1500, set cost of eval loop to 2 frames, and compiler mutliply to 2.
(cherry picked from commit fa45958450)
Co-authored-by: Mark Shannon <mark@hotpy.org>
gh-105069: Add a readline-like callable to the tokenizer to consume input iteratively (GH-105070)
(cherry picked from commit 9216e69a87)
Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
gh-105017: Include CRLF lines in strings and column numbers (GH-105030)
(cherry picked from commit 96fff35325)
Co-authored-by: Marta Gómez Macías <mgmacias@google.com>
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
gh-104866: Tokenize should emit NEWLINE after exiting block with comment (GH-104870)
(cherry picked from commit c90a862cdc)
Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
* Support for conversion specifiers o (octal) and X (uppercase hexadecimal).
* Support for length modifiers j (intmax_t) and t (ptrdiff_t).
* Length modifiers are now applied to all integer conversions.
* Support for wchar_t C strings (%ls and %lV).
* Support for variable width and precision (*).
* Support for flag - (left alignment).
This commit replaces the Python implementation of the tokenize module with an implementation
that reuses the real C tokenizer via a private extension module. The tokenize module now implements
a compatibility layer that transforms tokens from the C tokenizer into Python tokenize tokens for backward
compatibility.
As the C tokenizer does not emit some tokens that the Python tokenizer provides (such as comments and non-semantic newlines), a new special mode has been added to the C tokenizer mode that currently is only used via
the extension module that exposes it to the Python layer. This new mode forces the C tokenizer to emit these new extra tokens and add the appropriate metadata that is needed to match the old Python implementation.
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>