Commit graph

99 commits

Author SHA1 Message Date
Pablo Galindo Salgado
da8f87b7ea
gh-107015: Remove async_hacks from the tokenizer (#107018) 2023-07-26 16:34:15 +01:00
Victor Stinner
8c5f74fc89
gh-106023: Update code using _PyObject_FastCall() (#106257)
Replace _PyObject_FastCall() calls with PyObject_Vectorcall().
2023-06-30 01:05:01 +00:00
Marta Gómez Macías
96fff35325
gh-105017: Include CRLF lines in strings and column numbers (#105030)
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
2023-05-28 15:15:53 +01:00
Marta Gómez Macías
6715f91edc
gh-102856: Python tokenizer implementation for PEP 701 (#104323)
This commit replaces the Python implementation of the tokenize module with an implementation
that reuses the real C tokenizer via a private extension module. The tokenize module now implements
a compatibility layer that transforms tokens from the C tokenizer into Python tokenize tokens for backward
compatibility.

As the C tokenizer does not emit some tokens that the Python tokenizer provides (such as comments and non-semantic newlines), a new special mode has been added to the C tokenizer mode that currently is only used via
the extension module that exposes it to the Python layer. This new mode forces the C tokenizer to emit these new extra tokens and add the appropriate metadata that is needed to match the old Python implementation.

Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
2023-05-21 01:03:02 +01:00
Lysandros Nikolaou
9169a56fad
gh-103656: Transfer f-string buffers to parser to avoid use-after-free (GH-103896)
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
2023-04-27 01:33:31 +00:00
Pablo Galindo Salgado
1ef61cf71a
gh-102856: Initial implementation of PEP 701 (#102855)
Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
Co-authored-by: Batuhan Taskaya <isidentical@gmail.com>
Co-authored-by: Marta Gómez Macías <mgmacias@google.com>
Co-authored-by: sunmy2019 <59365878+sunmy2019@users.noreply.github.com>
2023-04-19 11:18:16 -05:00
Chenxi Mao
7703def37e
GH-102711: Fix warnings found by clang (#102712)
There are some warnings if build python via clang:

Parser/pegen.c:812:31: warning: a function declaration without a prototype is deprecated in all versions of C [-Wstrict-prototypes]
_PyPegen_clear_memo_statistics()
                              ^
                               void

Parser/pegen.c:820:29: warning: a function declaration without a prototype is deprecated in all versions of C [-Wstrict-prototypes]
_PyPegen_get_memo_statistics()
                            ^
                             void

Fix it to make clang happy.

Signed-off-by: Chenxi Mao <chenxi.mao@suse.com>
2023-03-28 10:52:22 +02:00
Mark Shannon
feec49c407
GH-101578: Normalize the current exception (GH-101607)
* Make sure that the current exception is always normalized.

* Remove redundant type and traceback fields for the current exception.

* Add new API functions: PyErr_GetRaisedException, PyErr_SetRaisedException

* Add new API functions: PyException_GetArgs, PyException_SetArgs
2023-02-08 09:31:12 +00:00
Eric Snow
91a8e002c2
gh-81057: Move More Globals to _PyRuntimeState (gh-100092)
https://github.com/python/cpython/issues/81057
2022-12-07 15:56:31 -07:00
Victor Stinner
4ce2a202c7
gh-99300: Use Py_NewRef() in Parser/ directory (#99330)
Replace Py_INCREF() with Py_NewRef() in C files of the Parser/
directory and in the PEG generator.
2022-11-10 15:30:05 +01:00
Lysandros Nikolaou
cbf0afd8a1
gh-97973: Return all necessary information from the tokenizer (GH-97984)
Right now, the tokenizer only returns type and two pointers to the start and end of the token.
This PR modifies the tokenizer to return the type and set all of the necessary information,
so that the parser does not have to this.
2022-10-06 16:07:17 -07:00
Gregory P. Smith
511ca94520
gh-95778: CVE-2020-10735: Prevent DoS by very large int() (#96499)
Integer to and from text conversions via CPython's bignum `int` type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds.

This PR comes fresh from a pile of work done in our private PSRT security response team repo.

Signed-off-by: Christian Heimes [Red Hat] <christian@python.org>
Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org>
Reviews via the private PSRT repo via many others (see the NEWS entry in the PR).

<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->

I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#). Much of that text wound up in the Issue. Backports PRs already exist. See the issue for links.
2022-09-02 09:35:08 -07:00
Honglin Zhu
b946f529ef
gh-95355: Check tokens[0] after allocating memory (GH-95356)
#95355

Automerge-Triggered-By: GH:pablogsal
2022-07-28 03:00:34 -07:00
Serhiy Storchaka
6fd4c8ec77
gh-93741: Add private C API _PyImport_GetModuleAttrString() (GH-93742)
It combines PyImport_ImportModule() and PyObject_GetAttrString()
and saves 4-6 lines of code on every use.

Add also _PyImport_GetModuleAttr() which takes Python strings as arguments.
2022-06-14 07:15:26 +03:00
Victor Stinner
5115a16831
gh-93103: Parser uses PyConfig.parser_debug instead of Py_DebugFlag (#93106)
* Replace deprecated Py_DebugFlag with PyConfig.parser_debug in the
  parser.
* Add Parser.debug member.
* Add tok_state.debug member.
* Py_FrozenMain(): Replace Py_VerboseFlag with PyConfig.verbose.
2022-05-24 22:35:08 +02:00
Oleg Iarygin
a52f82baf2
bpo-46920: Remove disabled debug code added decades ago and likely unnecessary (GH-31812) 2022-03-14 17:03:21 +01:00
Pablo Galindo Salgado
e19059ecd8
Don't print rejected tokens when using the debug flags in the parser (GH-31258) 2022-02-10 14:38:27 +00:00
Pablo Galindo Salgado
390459de6d
Allow the parser to avoid nested processing of invalid rules (GH-31252) 2022-02-10 13:12:14 +00:00
Pablo Galindo Salgado
69e10976b2
bpo-46521: Fix codeop to use a new partial-input mode of the parser (GH-31010) 2022-02-08 11:54:37 +00:00
Pablo Galindo Salgado
6fa8b2ceee
bpo-46237: Fix the line number of tokenizer errors inside f-strings (GH-30463) 2022-01-08 00:23:40 +00:00
Pablo Galindo Salgado
dd6c35761a
bpo-46110: Restore commit e9898bf153
This restores commit e9898bf153 .
2022-01-03 19:54:06 +00:00
Pablo Galindo Salgado
9d35dedc5e
Revert "bpo-46110: Add a recursion check to avoid stack overflow in the PEG parser (GH-30177)" (GH-30363)
This reverts commit e9898bf153 temporarily as we want to confirm if this commit is the cause of a slowdown at startup time.
2022-01-03 18:29:18 +00:00
Pablo Galindo Salgado
e9898bf153
bpo-46110: Add a recursion check to avoid stack overflow in the PEG parser (GH-30177)
Co-authored-by: Batuhan Taskaya <isidentical@gmail.com>
2021-12-20 15:43:26 +00:00
Kumar Aditya
41026c3155
bpo-45855: Replaced deprecated PyImport_ImportModuleNoBlock with PyImport_ImportModule (GH-30046) 2021-12-12 10:45:20 +02:00
Weipeng Hong
28179aac79
bpo-42918: Improve build-in function compile() in mode 'single' (GH-29934)
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
2021-12-11 00:44:26 +01:00
Pablo Galindo Salgado
24c10d2943
bpo-45727: Only trigger the 'did you forgot a comma' error suggestion if inside parentheses (GH-29757) 2021-11-24 22:21:23 +00:00
Pablo Galindo Salgado
c9c4444d9f
Refactor parser compilation units into specific components (GH-29676) 2021-11-21 01:08:50 +00:00
Pablo Galindo Salgado
79ff0d1687
bpo-45494: Fix error location in EOF tokenizer errors (GH-29108) 2021-11-20 17:40:59 +00:00
Pablo Galindo Salgado
fdcc46d955
bpo-45848: Allow the parser to get error lines from encoded files (GH-29646) 2021-11-20 15:36:07 +01:00
Pablo Galindo Salgado
546cefcda7
bpo-45727: Make the syntax error for missing comma more consistent (GH-29427) 2021-11-19 23:11:57 +00:00
Pablo Galindo Salgado
da20d7401d
bpo-45822: Respect PEP 263's coding cookies in the parser even if flags are not provided (GH-29582) 2021-11-16 12:30:47 -08:00
Pablo Galindo Salgado
df4ae55e66
bpo-45820: Fix a segfault when the parser fails without reading any input (GH-29580) 2021-11-16 19:51:52 +00:00
Pablo Galindo Salgado
25835c518a
bpo-45738: Fix computation of error location for invalid continuation (GH-29550)
characters in the parser
2021-11-14 01:06:41 +00:00
Pablo Galindo Salgado
a106343f63
bpo-45494: Fix parser crash when reporting errors involving invalid continuation characters (GH-28993)
There are two errors that this commit fixes:

* The parser was not correctly computing the offset and the string
  source for E_LINECONT errors due to the incorrect usage of strtok().
* The parser was not correctly unwinding the call stack when a tokenizer
  exception happened in rules involving optionals ('?', [...]) as we
  always make them return valid results by using the comma operator. We
  need to check first if we don't have an error before continuing.
2021-10-19 21:24:12 +02:00
Victor Stinner
713bb19356
bpo-45434: Mark the PyTokenizer C API as private (GH-28924)
Rename PyTokenize functions to mark them as private:

* PyTokenizer_FindEncodingFilename() => _PyTokenizer_FindEncodingFilename()
* PyTokenizer_FromString() => _PyTokenizer_FromString()
* PyTokenizer_FromFile() => _PyTokenizer_FromFile()
* PyTokenizer_FromUTF8() => _PyTokenizer_FromUTF8()
* PyTokenizer_Free() => _PyTokenizer_Free()
* PyTokenizer_Get() => _PyTokenizer_Get()

Remove the unused PyTokenizer_FindEncoding() function.

import.c: remove unused #include "errcode.h".
2021-10-13 17:22:14 +02:00
Pablo Galindo Salgado
0219017df7
bpo-45408: Don't override previous tokenizer errors in the second parser pass (GH-28812) 2021-10-07 22:33:05 +01:00
Pablo Galindo Salgado
e5f13ce5b4
bpo-43914: Correctly highlight SyntaxError exceptions for invalid generator expression in function calls (GH-28576) 2021-09-27 14:37:43 +01:00
Pablo Galindo Salgado
953d27261e
Update pegen to use the latest upstream developments (GH-27586) 2021-08-12 17:37:30 +01:00
Pablo Galindo Salgado
6948964ecf
bpo-34013: Generalize the invalid legacy statement error message (GH-27389) 2021-07-27 17:19:22 +01:00
Batuhan Taskaya
fbc349ff79
bpo-43950: Distinguish errors happening on character offset decoding (GH-27217) 2021-07-20 16:42:12 +01:00
Ammar Askar
5644c7b3ff
bpo-43950: Print columns in tracebacks (PEP 657) (GH-26958)
The traceback.c and traceback.py mechanisms now utilize the newly added code.co_positions and PyCode_Addr2Location
to print carets on the specific expressions involved in a traceback.

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
Co-authored-by: Ammar Askar <ammar@ammaraskar.com>
Co-authored-by: Batuhan Taskaya <batuhanosmantaskaya@gmail.com>
2021-07-05 00:14:33 +01:00
Pablo Galindo
0acc258fe6
bpo-44456: Improve the syntax error when mixing keyword and positional patterns (GH-26793) 2021-06-24 16:09:57 +01:00
Pablo Galindo
507ed6fa1d
bpo-44409: Fix error location in tokenizer errors that happen during initialization (GH-26712) 2021-06-14 17:46:11 +01:00
Serhiy Storchaka
be8b631b7a
Add more const modifiers. (GH-26691) 2021-06-12 16:11:59 +03:00
Pablo Galindo
457ce60fc7
bpo-44368: Ensure we don't raise incorrect custom syntax errors with soft keywords (GH-26630) 2021-06-09 22:20:01 +01:00
Pablo Galindo
9fd21f649d
bpo-44349: Fix edge case when displaying text from files with encoding in syntax errors (GH-26611) 2021-06-09 00:54:29 +01:00
Pablo Galindo
bafe0aade5
bpo-44335: Ensure the tokenizer doesn't go into Python with the error set (GH-26608) 2021-06-08 20:02:03 +01:00
Pablo Galindo
d334c73b56
bpo-44335: Fix a regression when identifying invalid characters in syntax errors (GH-26589) 2021-06-08 12:25:22 +01:00
Serhiy Storchaka
39dd141a4b
bpo-44273: Improve syntax error message for assigning to "..." (GH-26477)
Use "ellipsis" instead of "Ellipsis" in syntax error messages to eliminate confusion with built-in variable Ellipsis.
2021-06-01 12:07:05 +01:00
Pablo Galindo
bd7476dae3
bpo-44201: Avoid side effects of "invalid_*" rules in the REPL (GH-26298)
When the parser does a second pass to check for errors, these rules can
have some small side-effects as they may advance the parser more than
the point reached in the first pass. This can cause the tokenizer to ask
for extra tokens in interactive mode causing the tokenizer to show the
prompt instead of failing instantly.

To avoid this, add a new mode to the tokenizer that is activated in the
second pass and deactivates asking for new tokens when the interactive
line is finished. As the parsing should have reached the last line in
the first pass, the second pass should not need to ask for more tokens.
2021-05-22 23:05:00 +01:00