Commit graph

104 commits

Author SHA1 Message Date
Pablo Galindo Salgado
b69f3118a9
[3.12] gh-130077: Properly match full soft keywords in the parser (GH-135317) (#135400)
Some checks failed
Tests / Change detection (push) Has been cancelled
Lint / lint (push) Has been cancelled
Tests / Docs (push) Has been cancelled
Tests / Check if the ABI has changed (push) Has been cancelled
Tests / (push) Has been cancelled
Tests / Check if Autoconf files are up to date (push) Has been cancelled
Tests / Check if generated files are up to date (push) Has been cancelled
Tests / Windows MSI (push) Has been cancelled
Tests / Ubuntu SSL tests with OpenSSL (push) Has been cancelled
Tests / Hypothesis tests on Ubuntu (push) Has been cancelled
Tests / Address sanitizer (push) Has been cancelled
Tests / All required checks pass (push) Has been cancelled
(cherry picked from commit ff2b5f40c2)
2025-07-09 00:40:55 +01:00
Miss Islington (bot)
66042e180b
[3.12] gh-131762: Fixed dereferencing the pointer 'parser_token->metadata' with a NULL value (GH-131764) (#131775)
gh-131762: Fixed dereferencing the pointer 'parser_token->metadata' with a NULL value (GH-131764)
(cherry picked from commit 2c686a9ac2)

Co-authored-by: rialbat <47256826+rialbat@users.noreply.github.com>
2025-03-26 19:01:36 +00:00
Petr Viktorin
49f6beb56a
[3.12] gh-113993: Make interned strings mortal (GH-120520, GH-121364, GH-121903, GH-122303) (#123065)
This backports several PRs for gh-113993, making interned strings mortal so they can be garbage-collected when no longer needed.

* Allow interned strings to be mortal, and fix related issues (GH-120520)

  * Add an InternalDocs file describing how interning should work and how to use it.

  * Add internal functions to *explicitly* request what kind of interning is done:
    - `_PyUnicode_InternMortal`
    - `_PyUnicode_InternImmortal`
    - `_PyUnicode_InternStatic`

  * Switch uses of `PyUnicode_InternInPlace` to those.

  * Disallow using `_Py_SetImmortal` on strings directly.
    You should use `_PyUnicode_InternImmortal` instead:
    - Strings should be interned before immortalization, otherwise you're possibly
      interning a immortalizing copy.
    - `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to
      `SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in
      backports, as they are now part of public API and version-specific ABI.

  * Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery.

   Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:
    - `_Py_ID`
    - `_Py_STR` (including the empty string)
    - one-character latin-1 singletons

    Now, when you intern a singleton, that exact singleton will be interned.

  * Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic).

  * Intern `_Py_STR` singletons at startup.

  * Beef up the tests. Cover internal details (marked with `@cpython_only`).

  * Add lots of assertions

* Don't immortalize in PyUnicode_InternInPlace; keep immortalizing in other API (GH-121364)

  * Switch PyUnicode_InternInPlace to _PyUnicode_InternMortal, clarify docs

  * Document immortality in some functions that take `const char *`

  This is PyUnicode_InternFromString;
  PyDict_SetItemString, PyObject_SetAttrString;
  PyObject_DelAttrString; PyUnicode_InternFromString;
  and the PyModule_Add convenience functions.

  Always point out a non-immortalizing alternative.

  * Don't immortalize user-provided attr names in _ctypes

* Immortalize names in code objects to avoid crash (GH-121903)

* Intern latin-1 one-byte strings at startup (GH-122303)

There are some 3.12-specific changes, mainly to allow statically allocated strings in deepfreeze. (In 3.13, deepfreeze switched to the general `_Py_ID`/`_Py_STR`.)

Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
2024-09-27 13:28:48 -07:00
Miss Islington (bot)
02c19f0338
[3.12] gh-122270: Fix typos in the Py_DEBUG macro name (GH-122271) (GH-122276)
(cherry picked from commit 6c09b8de5c)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2024-07-25 11:22:42 +00:00
Miss Islington (bot)
4a0af0cfdc
[3.12] gh-119118: Fix performance regression in tokenize module (GH-119615) (#119683)
- Cache line object to avoid creating a Unicode object
  for all of the tokens in the same line.
- Speed up byte offset to column offset conversion by using the
  smallest buffer possible to measure the difference.

(cherry picked from commit d87b015106)

Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
2024-05-28 22:49:02 +02:00
Pablo Galindo Salgado
e4d2fb242a
[3.12] gh-112943: Correctly compute end offsets for multiline tokens in the tokenize module (GH-112949) (#112957)
(cherry picked from commit a135a6d2c6)
2023-12-11 12:48:19 +00:00
Pablo Galindo Salgado
c81ebf5b3d
[3.12] bpo-43950: handle wide unicode characters in tracebacks (GH-28150) (#111346) 2023-10-27 06:43:38 +09:00
Miss Islington (bot)
3f8d5d9ed6
[3.12] gh-105017: Include CRLF lines in strings and column numbers (GH-105030) (#105041)
gh-105017: Include CRLF lines in strings and column numbers (GH-105030)
(cherry picked from commit 96fff35325)

Co-authored-by: Marta Gómez Macías <mgmacias@google.com>
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
2023-05-28 15:16:43 +01:00
Marta Gómez Macías
6715f91edc
gh-102856: Python tokenizer implementation for PEP 701 (#104323)
This commit replaces the Python implementation of the tokenize module with an implementation
that reuses the real C tokenizer via a private extension module. The tokenize module now implements
a compatibility layer that transforms tokens from the C tokenizer into Python tokenize tokens for backward
compatibility.

As the C tokenizer does not emit some tokens that the Python tokenizer provides (such as comments and non-semantic newlines), a new special mode has been added to the C tokenizer mode that currently is only used via
the extension module that exposes it to the Python layer. This new mode forces the C tokenizer to emit these new extra tokens and add the appropriate metadata that is needed to match the old Python implementation.

Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
2023-05-21 01:03:02 +01:00
Lysandros Nikolaou
9169a56fad
gh-103656: Transfer f-string buffers to parser to avoid use-after-free (GH-103896)
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
2023-04-27 01:33:31 +00:00
Pablo Galindo Salgado
1ef61cf71a
gh-102856: Initial implementation of PEP 701 (#102855)
Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
Co-authored-by: Batuhan Taskaya <isidentical@gmail.com>
Co-authored-by: Marta Gómez Macías <mgmacias@google.com>
Co-authored-by: sunmy2019 <59365878+sunmy2019@users.noreply.github.com>
2023-04-19 11:18:16 -05:00
Chenxi Mao
7703def37e
GH-102711: Fix warnings found by clang (#102712)
There are some warnings if build python via clang:

Parser/pegen.c:812:31: warning: a function declaration without a prototype is deprecated in all versions of C [-Wstrict-prototypes]
_PyPegen_clear_memo_statistics()
                              ^
                               void

Parser/pegen.c:820:29: warning: a function declaration without a prototype is deprecated in all versions of C [-Wstrict-prototypes]
_PyPegen_get_memo_statistics()
                            ^
                             void

Fix it to make clang happy.

Signed-off-by: Chenxi Mao <chenxi.mao@suse.com>
2023-03-28 10:52:22 +02:00
Mark Shannon
feec49c407
GH-101578: Normalize the current exception (GH-101607)
* Make sure that the current exception is always normalized.

* Remove redundant type and traceback fields for the current exception.

* Add new API functions: PyErr_GetRaisedException, PyErr_SetRaisedException

* Add new API functions: PyException_GetArgs, PyException_SetArgs
2023-02-08 09:31:12 +00:00
Eric Snow
91a8e002c2
gh-81057: Move More Globals to _PyRuntimeState (gh-100092)
https://github.com/python/cpython/issues/81057
2022-12-07 15:56:31 -07:00
Victor Stinner
4ce2a202c7
gh-99300: Use Py_NewRef() in Parser/ directory (#99330)
Replace Py_INCREF() with Py_NewRef() in C files of the Parser/
directory and in the PEG generator.
2022-11-10 15:30:05 +01:00
Lysandros Nikolaou
cbf0afd8a1
gh-97973: Return all necessary information from the tokenizer (GH-97984)
Right now, the tokenizer only returns type and two pointers to the start and end of the token.
This PR modifies the tokenizer to return the type and set all of the necessary information,
so that the parser does not have to this.
2022-10-06 16:07:17 -07:00
Gregory P. Smith
511ca94520
gh-95778: CVE-2020-10735: Prevent DoS by very large int() (#96499)
Integer to and from text conversions via CPython's bignum `int` type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds.

This PR comes fresh from a pile of work done in our private PSRT security response team repo.

Signed-off-by: Christian Heimes [Red Hat] <christian@python.org>
Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org>
Reviews via the private PSRT repo via many others (see the NEWS entry in the PR).

<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->

I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#). Much of that text wound up in the Issue. Backports PRs already exist. See the issue for links.
2022-09-02 09:35:08 -07:00
Honglin Zhu
b946f529ef
gh-95355: Check tokens[0] after allocating memory (GH-95356)
#95355

Automerge-Triggered-By: GH:pablogsal
2022-07-28 03:00:34 -07:00
Serhiy Storchaka
6fd4c8ec77
gh-93741: Add private C API _PyImport_GetModuleAttrString() (GH-93742)
It combines PyImport_ImportModule() and PyObject_GetAttrString()
and saves 4-6 lines of code on every use.

Add also _PyImport_GetModuleAttr() which takes Python strings as arguments.
2022-06-14 07:15:26 +03:00
Victor Stinner
5115a16831
gh-93103: Parser uses PyConfig.parser_debug instead of Py_DebugFlag (#93106)
* Replace deprecated Py_DebugFlag with PyConfig.parser_debug in the
  parser.
* Add Parser.debug member.
* Add tok_state.debug member.
* Py_FrozenMain(): Replace Py_VerboseFlag with PyConfig.verbose.
2022-05-24 22:35:08 +02:00
Oleg Iarygin
a52f82baf2
bpo-46920: Remove disabled debug code added decades ago and likely unnecessary (GH-31812) 2022-03-14 17:03:21 +01:00
Pablo Galindo Salgado
e19059ecd8
Don't print rejected tokens when using the debug flags in the parser (GH-31258) 2022-02-10 14:38:27 +00:00
Pablo Galindo Salgado
390459de6d
Allow the parser to avoid nested processing of invalid rules (GH-31252) 2022-02-10 13:12:14 +00:00
Pablo Galindo Salgado
69e10976b2
bpo-46521: Fix codeop to use a new partial-input mode of the parser (GH-31010) 2022-02-08 11:54:37 +00:00
Pablo Galindo Salgado
6fa8b2ceee
bpo-46237: Fix the line number of tokenizer errors inside f-strings (GH-30463) 2022-01-08 00:23:40 +00:00
Pablo Galindo Salgado
dd6c35761a
bpo-46110: Restore commit e9898bf153
This restores commit e9898bf153 .
2022-01-03 19:54:06 +00:00
Pablo Galindo Salgado
9d35dedc5e
Revert "bpo-46110: Add a recursion check to avoid stack overflow in the PEG parser (GH-30177)" (GH-30363)
This reverts commit e9898bf153 temporarily as we want to confirm if this commit is the cause of a slowdown at startup time.
2022-01-03 18:29:18 +00:00
Pablo Galindo Salgado
e9898bf153
bpo-46110: Add a recursion check to avoid stack overflow in the PEG parser (GH-30177)
Co-authored-by: Batuhan Taskaya <isidentical@gmail.com>
2021-12-20 15:43:26 +00:00
Kumar Aditya
41026c3155
bpo-45855: Replaced deprecated PyImport_ImportModuleNoBlock with PyImport_ImportModule (GH-30046) 2021-12-12 10:45:20 +02:00
Weipeng Hong
28179aac79
bpo-42918: Improve build-in function compile() in mode 'single' (GH-29934)
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
2021-12-11 00:44:26 +01:00
Pablo Galindo Salgado
24c10d2943
bpo-45727: Only trigger the 'did you forgot a comma' error suggestion if inside parentheses (GH-29757) 2021-11-24 22:21:23 +00:00
Pablo Galindo Salgado
c9c4444d9f
Refactor parser compilation units into specific components (GH-29676) 2021-11-21 01:08:50 +00:00
Pablo Galindo Salgado
79ff0d1687
bpo-45494: Fix error location in EOF tokenizer errors (GH-29108) 2021-11-20 17:40:59 +00:00
Pablo Galindo Salgado
fdcc46d955
bpo-45848: Allow the parser to get error lines from encoded files (GH-29646) 2021-11-20 15:36:07 +01:00
Pablo Galindo Salgado
546cefcda7
bpo-45727: Make the syntax error for missing comma more consistent (GH-29427) 2021-11-19 23:11:57 +00:00
Pablo Galindo Salgado
da20d7401d
bpo-45822: Respect PEP 263's coding cookies in the parser even if flags are not provided (GH-29582) 2021-11-16 12:30:47 -08:00
Pablo Galindo Salgado
df4ae55e66
bpo-45820: Fix a segfault when the parser fails without reading any input (GH-29580) 2021-11-16 19:51:52 +00:00
Pablo Galindo Salgado
25835c518a
bpo-45738: Fix computation of error location for invalid continuation (GH-29550)
characters in the parser
2021-11-14 01:06:41 +00:00
Pablo Galindo Salgado
a106343f63
bpo-45494: Fix parser crash when reporting errors involving invalid continuation characters (GH-28993)
There are two errors that this commit fixes:

* The parser was not correctly computing the offset and the string
  source for E_LINECONT errors due to the incorrect usage of strtok().
* The parser was not correctly unwinding the call stack when a tokenizer
  exception happened in rules involving optionals ('?', [...]) as we
  always make them return valid results by using the comma operator. We
  need to check first if we don't have an error before continuing.
2021-10-19 21:24:12 +02:00
Victor Stinner
713bb19356
bpo-45434: Mark the PyTokenizer C API as private (GH-28924)
Rename PyTokenize functions to mark them as private:

* PyTokenizer_FindEncodingFilename() => _PyTokenizer_FindEncodingFilename()
* PyTokenizer_FromString() => _PyTokenizer_FromString()
* PyTokenizer_FromFile() => _PyTokenizer_FromFile()
* PyTokenizer_FromUTF8() => _PyTokenizer_FromUTF8()
* PyTokenizer_Free() => _PyTokenizer_Free()
* PyTokenizer_Get() => _PyTokenizer_Get()

Remove the unused PyTokenizer_FindEncoding() function.

import.c: remove unused #include "errcode.h".
2021-10-13 17:22:14 +02:00
Pablo Galindo Salgado
0219017df7
bpo-45408: Don't override previous tokenizer errors in the second parser pass (GH-28812) 2021-10-07 22:33:05 +01:00
Pablo Galindo Salgado
e5f13ce5b4
bpo-43914: Correctly highlight SyntaxError exceptions for invalid generator expression in function calls (GH-28576) 2021-09-27 14:37:43 +01:00
Pablo Galindo Salgado
953d27261e
Update pegen to use the latest upstream developments (GH-27586) 2021-08-12 17:37:30 +01:00
Pablo Galindo Salgado
6948964ecf
bpo-34013: Generalize the invalid legacy statement error message (GH-27389) 2021-07-27 17:19:22 +01:00
Batuhan Taskaya
fbc349ff79
bpo-43950: Distinguish errors happening on character offset decoding (GH-27217) 2021-07-20 16:42:12 +01:00
Ammar Askar
5644c7b3ff
bpo-43950: Print columns in tracebacks (PEP 657) (GH-26958)
The traceback.c and traceback.py mechanisms now utilize the newly added code.co_positions and PyCode_Addr2Location
to print carets on the specific expressions involved in a traceback.

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
Co-authored-by: Ammar Askar <ammar@ammaraskar.com>
Co-authored-by: Batuhan Taskaya <batuhanosmantaskaya@gmail.com>
2021-07-05 00:14:33 +01:00
Pablo Galindo
0acc258fe6
bpo-44456: Improve the syntax error when mixing keyword and positional patterns (GH-26793) 2021-06-24 16:09:57 +01:00
Pablo Galindo
507ed6fa1d
bpo-44409: Fix error location in tokenizer errors that happen during initialization (GH-26712) 2021-06-14 17:46:11 +01:00
Serhiy Storchaka
be8b631b7a
Add more const modifiers. (GH-26691) 2021-06-12 16:11:59 +03:00
Pablo Galindo
457ce60fc7
bpo-44368: Ensure we don't raise incorrect custom syntax errors with soft keywords (GH-26630) 2021-06-09 22:20:01 +01:00