Commit graph

151 commits

Author SHA1 Message Date
Eric V. Smith
579686d9fb
gh-134752: Improve speed of test_tokenize.StringPrefixTest.test_prefixes. (#134766) 2025-05-27 04:49:28 -04:00
Eric V. Smith
08c78e02fa
gh-134675: Add t-string prefixes to tokenizer module, lexical analysis doc, and add a test to make sure we catch this error in the future. (#134734)
* Add t-string prefixes to _all_string_prefixes, and add a test to make sure we catch this error in the future.

* Update lexical analysis docs for t-string prefixes.
2025-05-26 13:49:39 -04:00
Loïc Simon
52509cc94b
gh-134582: Fix t-strings untokenize() roundtrip removing space between braces (#134603) 2025-05-25 17:23:38 +01:00
Pablo Galindo Salgado
2f8b08da47
gh-129958: Properly disallow newlines in format specs in single-quoted f-strings (GH-130063) 2025-04-18 14:30:04 +02:00
Thomas Grainger
8a00c9a4d2
gh-128770: raise warnings as errors in test suite - except for test_socket which still logs warnings, and internal test warnings that are now logged (#128973)
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
2025-03-27 21:06:52 +02:00
Semyon Moroz
9b83670f0f
gh-131178: Add tests for tokenize command-line interface (#131274) 2025-03-27 18:04:16 +02:00
Mark Shannon
014223649c
GH-130396: Use computed stack limits on linux (GH-130398)
* Implement C recursion protection with limit pointers for Linux, MacOS and Windows

* Remove calls to PyOS_CheckStack

* Add stack protection to parser

* Make tests more robust to low stacks

* Improve error messages for stack overflow
2025-02-25 09:24:48 +00:00
Petr Viktorin
ef29104f7d
GH-91079: Revert "GH-91079: Implement C stack limits using addresses, not counters. (GH-130007)" for now (GH130413)
Revert "GH-91079: Implement C stack limits using addresses, not counters. (GH-130007)" for now

Unfortunatlely, the change broke some buildbots.

This reverts commit 2498c22fa0.
2025-02-24 11:16:08 +01:00
Mark Shannon
2498c22fa0
GH-91079: Implement C stack limits using addresses, not counters. (GH-130007)
* Implement C recursion protection with limit pointers

* Remove calls to PyOS_CheckStack

* Add stack protection to parser

* Make tests more robust to low stacks

* Improve error messages for stack overflow
2025-02-19 11:44:57 +00:00
Sam Gross
e5f10a7414
gh-127933: Add option to run regression tests in parallel (gh-128003)
This adds a new command line argument, `--parallel-threads` to the
regression test runner to allow it to run individual tests in multiple
threads in parallel in order to find multithreading bugs.

Some tests pass when run with `--parallel-threads`, but there's still
more work before the entire suite passes.
2025-02-04 17:44:59 -05:00
Tomas R.
7ad793e5db
gh-125553: Fix backslash continuation in untokenize (#126010) 2025-01-21 19:58:44 +00:00
Tomas R.
db23b8bb13
gh-125008: Fix tokenize.untokenize roundtrip for \n{{ (#125013) 2024-10-06 15:16:41 +02:00
Serhiy Storchaka
1a0c7b9ba4
gh-121905: Consistently use "floating-point" instead of "floating point" (GH-121907) 2024-07-19 08:06:02 +00:00
Lysandros Nikolaou
4b5d3e0e72
gh-120343: Fix column offsets of multiline tokens in tokenize (#120391) 2024-06-12 20:52:55 +02:00
Lysandros Nikolaou
1b62bcee94
gh-120343: Do not reset byte_col_offset_diff after multiline tokens (#120352)
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
2024-06-11 17:00:53 +00:00
Pablo Galindo Salgado
ecf16ee50e
gh-115154: Fix untokenize handling of unicode named literals (#115171) 2024-02-19 14:54:10 +00:00
Pablo Galindo Salgado
a135a6d2c6
gh-112943: Correctly compute end offsets for multiline tokens in the tokenize module (#112949) 2023-12-11 11:44:22 +00:00
Nikita Sobolev
e9b5399bee
gh-111031: Check more files in test_tokenize (#111032) 2023-10-19 09:29:45 +01:00
Lysandros Nikolaou
17d65547df
gh-104169: Fix test_peg_generator after tokenizer refactoring (#110727)
* Fix test_peg_generator after tokenizer refactoring
* Remove references to tokenizer.c in comments etc.
2023-10-12 09:34:35 +02:00
Nikita Sobolev
732532b0af
gh-108303: Move all inspect test files to test_inspect/ (#109607) 2023-10-10 22:15:11 +02:00
Pablo Galindo Salgado
cc389ef627
gh-110259: Fix f-strings with multiline expressions and format specs (#110271)
Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
2023-10-05 14:26:44 +01:00
Nikita Sobolev
1110c5bc82
gh-108303: Move tokenize-related data to Lib/test/tokenizedata (GH-109265) 2023-09-12 09:37:42 +03:00
Anthony Shaw
2c4c26c4ce
gh-108469: Update ast.unparse for unescaped quote support from PEP701 [3.12] (#108553)
Co-authored-by: sunmy2019 <59365878+sunmy2019@users.noreply.github.com>
2023-09-05 21:01:23 +01:00
Pablo Galindo Salgado
da8f87b7ea
gh-107015: Remove async_hacks from the tokenizer (#107018) 2023-07-26 16:34:15 +01:00
Nikita Sobolev
d830c4a944
gh-106200: Remove unused imports (#106201) 2023-06-28 11:55:41 +00:00
Lysandros Nikolaou
ab3823a97b
gh-71299: Fix __all__ in tokenize (#105907)
Co-authored-by: Unit03
2023-06-19 13:31:57 +02:00
Lysandros Nikolaou
d382ad4915
gh-105820: Fix tok_mode expression buffer in file & readline tokenizer (#105828) 2023-06-15 16:21:24 +00:00
Lysandros Nikolaou
abfbab6415
gh-105718: Fix buffer allocation in tokenizer with readline (#105728) 2023-06-13 16:18:11 +01:00
Pablo Galindo Salgado
b047fa5e56
gh-105549: Tokenize separately NUMBER and NAME tokens and allow 0-prefixed literals (#105555) 2023-06-09 21:39:01 +01:00
Pablo Galindo Salgado
d7f46bcd98
gh-105564: Don't include artificial newlines in the line attribute of tokens (#105565) 2023-06-09 17:01:26 +01:00
Pablo Galindo Salgado
7279fb6408
gh-105435: Fix spurious NEWLINE token if file ends with comment without a newline (#105442) 2023-06-07 13:31:48 +01:00
Pablo Galindo Salgado
ffd2654550
gh-105390: Correctly raise TokenError instead of SyntaxError for tokenize errors (#105399) 2023-06-07 12:04:40 +01:00
Pablo Galindo Salgado
c0a6ed3934
gh-105259: Ensure we don't show newline characters for trailing NEWLINE tokens (#105364) 2023-06-06 12:52:16 +01:00
Lysandros Nikolaou
70f315c2d6
gh-105042: Disable unmatched parens syntax error in python tokenize (#105061) 2023-05-30 22:52:52 +01:00
Pablo Galindo Salgado
9216e69a87
gh-105069: Add a readline-like callable to the tokenizer to consume input iteratively (#105070) 2023-05-30 22:43:34 +01:00
Marta Gómez Macías
96fff35325
gh-105017: Include CRLF lines in strings and column numbers (#105030)
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
2023-05-28 15:15:53 +01:00
Marta Gómez Macías
86d8f48935
gh-105017: Fix including additional NL token when using CRLF (#105022)
Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
2023-05-27 16:50:43 +00:00
Pablo Galindo Salgado
46b52e6e2b
gh-104976: Ensure trailing dedent tokens are emitted as the previous tokenizer (#104980)
Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
2023-05-26 22:02:26 +01:00
Pablo Galindo Salgado
3fdb55c482
gh-104972: Ensure that line attributes in tokens in the tokenize module are correct (#104975) 2023-05-26 15:46:22 +01:00
Lysandros Nikolaou
c90a862cdc
gh-104866: Tokenize should emit NEWLINE after exiting block with comment (#104870) 2023-05-24 17:18:17 +01:00
Pablo Galindo Salgado
c8cf9b42eb
gh-104825: Remove implicit newline in the line attribute in tokens emitted in the tokenize module (#104846) 2023-05-24 09:59:18 +00:00
Marta Gómez Macías
729b252241
gh-104741: Add line number attribute to indentation error exception (#104743) 2023-05-22 11:30:18 +00:00
Marta Gómez Macías
6715f91edc
gh-102856: Python tokenizer implementation for PEP 701 (#104323)
This commit replaces the Python implementation of the tokenize module with an implementation
that reuses the real C tokenizer via a private extension module. The tokenize module now implements
a compatibility layer that transforms tokens from the C tokenizer into Python tokenize tokens for backward
compatibility.

As the C tokenizer does not emit some tokens that the Python tokenizer provides (such as comments and non-semantic newlines), a new special mode has been added to the C tokenizer mode that currently is only used via
the extension module that exposes it to the Python layer. This new mode forces the C tokenizer to emit these new extra tokens and add the appropriate metadata that is needed to match the old Python implementation.

Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
2023-05-21 01:03:02 +01:00
chgnrdv
d5a97074d2
gh-103824: fix use-after-free error in Parser/tokenizer.c (#103993) 2023-05-01 15:26:43 +00:00
Pablo Galindo Salgado
1ef61cf71a
gh-102856: Initial implementation of PEP 701 (#102855)
Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
Co-authored-by: Batuhan Taskaya <isidentical@gmail.com>
Co-authored-by: Marta Gómez Macías <mgmacias@google.com>
Co-authored-by: sunmy2019 <59365878+sunmy2019@users.noreply.github.com>
2023-04-19 11:18:16 -05:00
Pablo Galindo Salgado
e13d1d9dda
gh-99581: Fix a buffer overflow in the tokenizer when copying lines that fill the available buffer (#99605) 2022-11-20 20:20:03 +00:00
Michael Droettboom
23e83a8465
gh-94808: Coverage: Test that maximum indentation level is handled (#95926)
* gh-94808: Coverage: Test that maximum indentation level is handled

* Use "compile" rather than "exec"
2022-10-06 10:39:17 -07:00
Serhiy Storchaka
6927632492
Remove trailing spaces (GH-31695) 2022-03-05 17:47:00 +02:00
Pablo Galindo Salgado
a0efc0c196
bpo-46091: Correctly calculate indentation levels for whitespace lines with continuation characters (GH-30130) 2022-01-25 22:12:14 +00:00
Serhiy Storchaka
a5a56154f1
Remove trailing spaces. (GH-28706) 2021-10-03 16:58:14 +03:00