Commit graph

161 commits

Author SHA1 Message Date
Eric V. Smith
08c78e02fa
gh-134675: Add t-string prefixes to tokenizer module, lexical analysis doc, and add a test to make sure we catch this error in the future. (#134734)
* Add t-string prefixes to _all_string_prefixes, and add a test to make sure we catch this error in the future.

* Update lexical analysis docs for t-string prefixes.
2025-05-26 13:49:39 -04:00
Loïc Simon
52509cc94b
gh-134582: Fix t-strings untokenize() roundtrip removing space between braces (#134603) 2025-05-25 17:23:38 +01:00
Hugo van Kemenade
4ac916ae33
gh-130645: Add color to stdlib argparse CLIs (gh-133380) 2025-05-05 19:46:46 +02:00
Serhiy Storchaka
84a08f8629
gh-133306: Use \z instead of \Z in regular expressions in the stdlib (GH-133337) 2025-05-03 17:58:49 +03:00
Lysandros Nikolaou
60202609a2
gh-132661: Implement PEP 750 (#132662)
Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
Co-authored-by: Wingy <git@wingysam.xyz>
Co-authored-by: Koudai Aono <koxudaxi@gmail.com>
Co-authored-by: Dave Peck <davepeck@gmail.com>
Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
Co-authored-by: Paul Everitt <pauleveritt@me.com>
Co-authored-by: sobolevn <mail@sobolevn.me>
2025-04-30 11:46:41 +02:00
Semyon Moroz
9b83670f0f
gh-131178: Add tests for tokenize command-line interface (#131274) 2025-03-27 18:04:16 +02:00
Tomas R.
7ad793e5db
gh-125553: Fix backslash continuation in untokenize (#126010) 2025-01-21 19:58:44 +00:00
Tomas R.
aef52ca8b3
gh-128519: Align the docstring of untokenize() to match the docs (#128521) 2025-01-06 08:42:26 +00:00
Serhiy Storchaka
7d2c39752f
gh-91818: Use default program name in the CLI of many modules (GH-124867)
As argparse now detects by default when the code was run as a module.

This leads to using the actual executable name instead of simply "python"
to display in the usage message ("usage: python -m ...").
2024-10-10 00:20:53 +03:00
Tomas R.
db23b8bb13
gh-125008: Fix tokenize.untokenize roundtrip for \n{{ (#125013) 2024-10-06 15:16:41 +02:00
Pablo Galindo Salgado
ecf16ee50e
gh-115154: Fix untokenize handling of unicode named literals (#115171) 2024-02-19 14:54:10 +00:00
Lysandros Nikolaou
17d65547df
gh-104169: Fix test_peg_generator after tokenizer refactoring (#110727)
* Fix test_peg_generator after tokenizer refactoring
* Remove references to tokenizer.c in comments etc.
2023-10-12 09:34:35 +02:00
Lysandros Nikolaou
ab3823a97b
gh-71299: Fix __all__ in tokenize (#105907)
Co-authored-by: Unit03
2023-06-19 13:31:57 +02:00
Pablo Galindo Salgado
ffd2654550
gh-105390: Correctly raise TokenError instead of SyntaxError for tokenize errors (#105399) 2023-06-07 12:04:40 +01:00
Pablo Galindo Salgado
f04c16875b
gh-105324: Fix tokenize module main function for stdin (#105325) 2023-06-05 18:36:40 +02:00
Pablo Galindo Salgado
9216e69a87
gh-105069: Add a readline-like callable to the tokenizer to consume input iteratively (#105070) 2023-05-30 22:43:34 +01:00
Pablo Galindo Salgado
46b52e6e2b
gh-104976: Ensure trailing dedent tokens are emitted as the previous tokenizer (#104980)
Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
2023-05-26 22:02:26 +01:00
Marta Gómez Macías
8817886ae5
gh-102856: Tokenize performance improvement (#104731) 2023-05-22 00:29:04 +00:00
Marta Gómez Macías
ffe47cb623
gh-104719: Restore Tokenize module constants (#104722) 2023-05-21 17:07:28 +01:00
Marta Gómez Macías
6715f91edc
gh-102856: Python tokenizer implementation for PEP 701 (#104323)
This commit replaces the Python implementation of the tokenize module with an implementation
that reuses the real C tokenizer via a private extension module. The tokenize module now implements
a compatibility layer that transforms tokens from the C tokenizer into Python tokenize tokens for backward
compatibility.

As the C tokenizer does not emit some tokens that the Python tokenizer provides (such as comments and non-semantic newlines), a new special mode has been added to the C tokenizer mode that currently is only used via
the extension module that exposes it to the Python layer. This new mode forces the C tokenizer to emit these new extra tokens and add the appropriate metadata that is needed to match the old Python implementation.

Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
2023-05-21 01:03:02 +01:00
Nikita Sobolev
0cbdd21311
bpo-46565: del loop vars that are leaking into module namespaces (GH-30993) 2022-02-03 11:20:08 +02:00
Pablo Galindo Salgado
a24676bedc
Add tests for the C tokenizer and expose it as a private module (GH-27924) 2021-08-24 17:50:05 +01:00
Pablo Galindo Salgado
b6bde9fc42
bpo-44667: Treat correctly lines ending with comments and no newlines in the Python tokenizer (GH-27499) 2021-07-31 02:17:09 +01:00
Anthony Sottile
15bd9efd01
bpo-43014: Improve performance of tokenize.tokenize by 20-30% 2021-01-24 12:23:17 +03:00
Anthony Sottile
2a58b0636d bpo-5028: Fix up rest of documentation for tokenize documenting line (GH-13686)
https://bugs.python.org/issue5028
2019-05-30 15:06:32 -07:00
Andrew Carr
1e36f75d63 bpo-5028: fix doc bug for tokenize (GH-11683)
https://bugs.python.org/issue5028
2019-05-30 12:31:51 -07:00
penguindustin
9646630895 bpo-36766: Typos in docs and code comments (GH-13116) 2019-05-06 14:57:17 -04:00
Serhiy Storchaka
8ac658114d
bpo-30455: Generate all token related code and docs from Grammar/Tokens. (GH-10370)
"Include/token.h", "Lib/token.py" (containing now some data moved from
"Lib/tokenize.py") and new files "Parser/token.c" (containing the code
moved from "Parser/tokenizer.c") and "Doc/library/token-list.inc" (included
in "Doc/library/token.rst") are now generated from "Grammar/Tokens" by
"Tools/scripts/generate_token.py". The script overwrites files only if
needed and can be used on the read-only sources tree.

"Lib/symbol.py" is now generated by "Tools/scripts/generate_symbol_py.py"
instead of been executable itself.

Added new make targets "regen-token" and "regen-symbol" which are now
dependencies of "regen-all".

The documentation contains now strings for operators and punctuation tokens.
2018-12-22 11:18:40 +02:00
Ammar Askar
c4ef4896ea bpo-33899: Make tokenize module mirror end-of-file is end-of-line behavior (GH-7891)
Most of the change involves fixing up the test suite, which previously made
the assumption that there wouldn't be a new line if the input didn't end in
one.

Contributed by Ammar Askar.
2018-07-06 10:19:08 +03:00
Thomas Kluyver
c56b17bd8c bpo-12486: Document tokenize.generate_tokens() as public API (#6957)
* Document tokenize.generate_tokens()

* Add news file

* Add test for generate_tokens

* Document behaviour around ENCODING token

* Add generate_tokens to __all__
2018-06-05 10:26:39 -07:00
Łukasz Langa
c2d384dbd7
bpo-33338: [tokenize] Minor code cleanup (#6573)
This change contains minor things that make diffing between Lib/tokenize.py and
Lib/lib2to3/pgen2/tokenize.py cleaner.
2018-04-23 01:07:11 -07:00
Serhiy Storchaka
d08972fdb9
bpo-33260: Regenerate token.py after removing ASYNC and AWAIT. (GH-6447) 2018-04-11 19:15:51 +03:00
Jelle Zijlstra
ac317700ce bpo-30406: Make async and await proper keywords (#1669)
Per PEP 492, 'async' and 'await' should become proper keywords in 3.7.
2017-10-05 23:24:46 -04:00
Albert-Jan Nijburg
fc354f0785 bpo-25324: copy tok_name before changing it (#1608)
* add test to check if were modifying token

* copy list so import tokenize doesnt have side effects on token

* shorten line

* add tokenize tokens to token.h to get them to show up in token

* move ERRORTOKEN back to its previous location, and fix nitpick

* copy comments from token.h automatically

* fix whitespace and make more pythonic

* change to fix comments from @haypo

* update token.rst and Misc/NEWS

* change wording

* some more wording changes
2017-05-31 16:00:21 +02:00
Albert-Jan Nijburg
c471ca448c bpo-30377: Simplify handling of COMMENT and NL in tokenize.py (#1607) 2017-05-24 14:31:57 +03:00
Jon Dufresne
3972628de3 bpo-30296 Remove unnecessary tuples, lists, sets, and dicts (#1489)
* Replaced list(<generator expression>) with list comprehension
* Replaced dict(<generator expression>) with dict comprehension
* Replaced set(<list literal>) with set literal
* Replaced builtin func(<list comprehension>) with func(<generator
  expression>) when supported (e.g. any(), all(), tuple(), min(), &
  max())
2017-05-18 07:35:54 -07:00
Jim Fasarakis-Hilliard
d4914e9041 Add ELLIPSIS and RARROW. Add tests (#666) 2017-03-14 21:16:15 +01:00
Brett Cannon
a721abac29 Issue #26331: Implement the parsing part of PEP 515.
Thanks to Georg Brandl for the patch.
2016-09-09 14:57:09 -07:00
Serhiy Storchaka
a051bf3afb Issue #26581: Use the first coding cookie on a line, not the last one. 2016-03-20 23:47:48 +02:00
Serhiy Storchaka
e431d3c9aa Issue #26581: Use the first coding cookie on a line, not the last one. 2016-03-20 23:36:29 +02:00
Berker Peksag
a7161e7fac Issue #25977: Fix typos in Lib/tokenize.py
Patch by John Walker.
2015-12-30 01:42:43 +02:00
Berker Peksag
ff8d0873aa Issue #25977: Fix typos in Lib/tokenize.py
Patch by John Walker.
2015-12-30 01:41:58 +02:00
Eric V. Smith
1c8222c80a Issue 25311: Add support for f-strings to tokenize.py. Also added some comments to explain what's happening, since it's not so obvious. 2015-10-26 04:37:55 -04:00
Yury Selivanov
96ec934e75 Issue #24619: Simplify async/await tokenization.
This commit simplifies async/await tokenization in tokenizer.c,
tokenize.py & lib2to3/tokenize.py.  Previous solution was to keep
a stack of async-def & def blocks, whereas the new approach is just
to remember position of the outermost async-def block.

This change won't bring any parsing performance improvements, but
it makes the code much easier to read and validate.
2015-07-23 15:01:58 +03:00
Yury Selivanov
8fb307cd65 Issue #24619: New approach for tokenizing async/await.
This commit fixes how one-line async-defs and defs are tracked
by tokenizer.  It allows to correctly parse invalid code such
as:

>>> async def f():
...     def g(): pass
...     async = 10

and valid code such as:

>>> async def f():
...     async def g(): pass
...     await z

As a consequence, is is now possible to have one-line
'async def foo(): await ..' functions:

>>> async def foo(): return await bar()
2015-07-22 13:33:45 +03:00
Jason R. Coombs
a95a476b3a Issue #20387: Merge test and patch from 3.4.4 2015-06-28 11:13:30 -04:00
Dingyuan Wang
e411b6629f Issue #20387: Restore retention of indentation during untokenize. 2015-06-22 10:01:12 +08:00
Victor Stinner
24d262af0b (Merge 3.5) Issue #23840: tokenize.open() now closes the temporary binary file
on error to fix a resource warning.
2015-05-26 00:46:44 +02:00
Victor Stinner
387729e183 Issue #23840: tokenize.open() now closes the temporary binary file on error to
fix a resource warning.
2015-05-26 00:43:58 +02:00
Yury Selivanov
7544508f02 PEP 0492 -- Coroutines with async and await syntax. Issue #24017. 2015-05-11 22:57:16 -04:00