Commit graph

1644 commits

Author SHA1 Message Date
Serhiy Storchaka
4398b788ff
[3.12] gh-133767: Fix use-after-free in the unicode-escape decoder with an error handler (GH-129648) (GH-133944) (#134337)
If the error handler is used, a new bytes object is created to set as
the object attribute of UnicodeDecodeError, and that bytes object then
replaces the original data. A pointer to the decoded data will became invalid
after destroying that temporary bytes object. So we need other way to return
the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().

_PyBytes_DecodeEscape() does not have such issue, because it does not
use the error handlers registry, but it should be changed for compatibility
with _PyUnicode_DecodeUnicodeEscapeInternal().
(cherry picked from commit 9f69a58623)
(cherry picked from commit 6279eb8c07)
2025-05-25 20:33:22 -07:00
Stan Ulbrych
8a598fb623
[3.12] gh-82045: Correct and deduplicate "isprintable" docs; add test. (GH-130125)
We had the definition of what makes a character "printable" documented in three places, giving two different definitions.
The definition in the comment on `_PyUnicode_IsPrintable` was inverted; correct that.

With that correction, the two definitions turn out to be equivalent -- but to confirm that, you have to go look up, or happen to know, that those are the only five "Other" categories and only three "Separator" categories in the Unicode character database.  That makes it hard for the reader to tell whether they really are the same, or if there's some subtle difference in the intended semantics.

Fix that by cutting the C API docs' and the C comment's copies of the subtle details, in favor of referring to the Python-level docs. That ensures it's explicit that these are all meant to agree, and also lets us concentrate improvements to the wording in one place.

Speaking of which, borrow some ideas from the C comment, along with other tweaks, to hopefully add a bit more clarity to that one newly-centralized copy in the docs.

Also add a thorough test that the implementation agrees with this definition.

Co-authored-by: Greg Price <gnprice@gmail.com>
(cherry picked from commit 3402e133ef)
2025-02-17 14:07:59 +01:00
Miss Islington (bot)
e44045924d
[3.12] gh-127903: Fix a crash on debug builds when calling Objects/unicodeobject::_copy_characters (GH-127876) (#128459)
gh-127903: Fix a crash on debug builds when calling `Objects/unicodeobject::_copy_characters`` (GH-127876)
(cherry picked from commit 46cb6340d7)

Co-authored-by: Alexander Shadchin <shadchin@yandex-team.com>
2025-01-03 21:21:08 +02:00
Miss Islington (bot)
49da170709
[3.12] gh-116510: Fix a Crash Due to Shared Immortal Interned Strings (gh-125205)
Fix a crash caused by immortal interned strings being shared between
sub-interpreters that use basic single-phase init. In that case, the string
can be used by an interpreter that outlives the interpreter that created and
interned it. For interpreters that share obmalloc state, also share the
interned dict with the main interpreter.

This is an un-revert of gh-124646 that then addresses the Py_TRACE_REFS
failures identified by gh-124785 (i.e. backporting gh-125709 too).

(cherry picked from commit f2cb399470, AKA gh-124865)

Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
2024-12-03 10:26:25 -07:00
Bénédikt Tran
9f0d6b71d6
[3.12] Fix Unicode encode_wstr_utf8() (#127420) (#127504)
Fix Unicode encode_wstr_utf8() (#127420)

Raise RuntimeError instead of RuntimeWarning.

Co-authored-by: Victor Stinner <vstinner@python.org>
2024-12-02 13:24:26 +01:00
Victor Stinner
e7ab97862d
[3.12] gh-127208: Reject null character in _imp.create_dynamic() (#127400) (#127419)
gh-127208: Reject null character in _imp.create_dynamic() (#127400)

_imp.create_dynamic() now rejects embedded null characters in the
path and in the module name.

Backport also the _PyUnicode_AsUTF8NoNUL() function.

(cherry picked from commit b14fdadc6c)
2024-11-29 17:03:24 +01:00
Petr Viktorin
b3e2c02915
[3.12] gh-113993: For string interning, do not rely on (or assert) _Py_IsImmortal (GH-121358) (GH-124938)
gh-113993: For string interning, do not rely on (or assert) _Py_IsImmortal (GH-121358)

Older stable ABI extensions are allowed to make immortal objects mortal.
Instead, use `_PyUnicode_STATE` (`interned` and `statically_allocated`).
(cherry picked from commit 956270d08d)

Co-authored-by: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com>
2024-10-04 16:50:34 +02:00
Neil Schemenauer
ffc6a10149
[3.12] gh-124785: Revert "[3.12] gh-116510: Fix a crash due to shared immortal interned strings. (gh-124541)" (#124814)
Revert "[3.12] gh-116510: Fix a crash due to shared immortal interned strings. (gh-124541)"

This reverts commit 5dd07ebc0c.
2024-09-30 18:54:41 -07:00
Petr Viktorin
49f6beb56a
[3.12] gh-113993: Make interned strings mortal (GH-120520, GH-121364, GH-121903, GH-122303) (#123065)
This backports several PRs for gh-113993, making interned strings mortal so they can be garbage-collected when no longer needed.

* Allow interned strings to be mortal, and fix related issues (GH-120520)

  * Add an InternalDocs file describing how interning should work and how to use it.

  * Add internal functions to *explicitly* request what kind of interning is done:
    - `_PyUnicode_InternMortal`
    - `_PyUnicode_InternImmortal`
    - `_PyUnicode_InternStatic`

  * Switch uses of `PyUnicode_InternInPlace` to those.

  * Disallow using `_Py_SetImmortal` on strings directly.
    You should use `_PyUnicode_InternImmortal` instead:
    - Strings should be interned before immortalization, otherwise you're possibly
      interning a immortalizing copy.
    - `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to
      `SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in
      backports, as they are now part of public API and version-specific ABI.

  * Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery.

   Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:
    - `_Py_ID`
    - `_Py_STR` (including the empty string)
    - one-character latin-1 singletons

    Now, when you intern a singleton, that exact singleton will be interned.

  * Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic).

  * Intern `_Py_STR` singletons at startup.

  * Beef up the tests. Cover internal details (marked with `@cpython_only`).

  * Add lots of assertions

* Don't immortalize in PyUnicode_InternInPlace; keep immortalizing in other API (GH-121364)

  * Switch PyUnicode_InternInPlace to _PyUnicode_InternMortal, clarify docs

  * Document immortality in some functions that take `const char *`

  This is PyUnicode_InternFromString;
  PyDict_SetItemString, PyObject_SetAttrString;
  PyObject_DelAttrString; PyUnicode_InternFromString;
  and the PyModule_Add convenience functions.

  Always point out a non-immortalizing alternative.

  * Don't immortalize user-provided attr names in _ctypes

* Immortalize names in code objects to avoid crash (GH-121903)

* Intern latin-1 one-byte strings at startup (GH-122303)

There are some 3.12-specific changes, mainly to allow statically allocated strings in deepfreeze. (In 3.13, deepfreeze switched to the general `_Py_ID`/`_Py_STR`.)

Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
2024-09-27 13:28:48 -07:00
Neil Schemenauer
5dd07ebc0c
[3.12] gh-116510: Fix a crash due to shared immortal interned strings. (gh-124541)
Fix a crash caused by immortal interned strings being shared between
sub-interpreters that use basic single-phase init.  In that case, the string
can be used by an interpreter that outlives the interpreter that created and
interned it.  For interpreters that share obmalloc state, also share the
interned dict with the main interpreter.
2024-09-26 18:04:03 -07:00
Sam Gross
92564331de
[3.12] gh-113964: Don't prevent new threads until all non-daemon threads exit (GH-116677) (#117029)
Starting in Python 3.12, we prevented calling fork() and starting new threads
during interpreter finalization (shutdown). This has led to a number of
regressions and flaky tests. We should not prevent starting new threads
(or `fork()`) until all non-daemon threads exit and finalization starts in
earnest.

This changes the checks to use `_PyInterpreterState_GetFinalizing(interp)`,
which is set immediately before terminating non-daemon threads.

(cherry picked from commit 60e105c1c1)
2024-03-19 15:22:42 -04:00
Hugo van Kemenade
2ddee2e245
[3.12] gh-110383: Improve accuracy of str.split() and str.rsplit() docstrings (GH-113355) (#113379)
Co-authored-by: Erlend E. Aasland <erlend@python.org>
2023-12-22 00:43:28 -07:00
Eric Snow
4f71f1680d
[3.12] gh-106931: Intern Statically Allocated Strings Globally (gh-107272) (gh-110713)
We tried this before with a dict and for all interned strings.  That ran into problems due to interpreter isolation.  However, exclusively using a per-interpreter cache caused some inconsistency that can eliminate the benefit of interning.  Here we circle back to using a global cache, but only for statically allocated strings.  We also use a more-basic _Py_hashtable_t for that global cache instead of a dict.

Ideally we would only have the global cache, but the optional isolation of each interpreter's allocator means that a non-static string object must not outlive its interpreter.  Thus we would have to store a copy of each such interned string in the global cache, tied to the main interpreter.

(cherry-picked from commit b72947a8d2)
2023-11-27 23:51:12 +00:00
Shantanu
2979cee1af
[3.12] gh-108915: Removes extra backslashes in str.split docstring (GH-109044). (#109061)
* [3.12] gh-108915: Removes extra backslashes in str.split docstring (GH-109044).
(cherry picked from commit e7d5433f94)

Co-authored-by: Daniel Weiss <134341009+justdan6@users.noreply.github.com>

* re-clinic

---------

Co-authored-by: Daniel Weiss <134341009+justdan6@users.noreply.github.com>
2023-09-08 15:19:38 +02:00
Miss Islington (bot)
3e20303717
[3.12] gh-107913: Fix possible losses of OSError error codes (GH-107930) (#108523)
gh-107913: Fix possible losses of OSError error codes (GH-107930)

Functions like PyErr_SetFromErrno() and SetFromWindowsErr() should be
called immediately after using the C API which sets errno or the Windows
error code.
(cherry picked from commit 2b15536fa9)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2023-08-27 01:24:40 +02:00
Miss Islington (bot)
d0176ed911
[3.12] gh-105699: Fix an Interned Strings Crasher (gh-106930) (#106963)
gh-105699: Fix an Interned Strings Crasher (gh-106930)

A static (process-global) str object must only have its "interned" state cleared when no longer interned in any interpreters.  They are the only ones that can be shared by interpreters so we don't have to worry about any other str objects.

We trigger clearing the state with the main interpreter, since no other interpreters may exist at that point and _PyUnicode_ClearInterned() is only called during interpreter finalization.

We do not address here the fact that a string will only be interned in the first interpreter that interns it.  In any subsequent interpreters str.state.interned is already set so _PyUnicode_InternInPlace() will skip it.  That needs to be addressed separately from fixing the crasher.
(cherry picked from commit 87e7cb09e4)

Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
2023-07-21 22:28:22 +02:00
Miss Islington (bot)
957f14d0de
[3.12] gh-105699: Fix a Crasher Related to a Deprecated Global Variable (gh-106923) (#106964)
gh-105699: Fix a Crasher Related to a Deprecated Global Variable (gh-106923)

There was a slight race in _Py_ClearFileSystemEncoding() (when called from _Py_SetFileSystemEncoding()), between freeing the value and setting the variable to NULL, which occasionally caused crashes when multiple isolated interpreters were used.  (Notably, I saw at least 10 different, seemingly unrelated spooky-action-at-a-distance, ways this crashed. Yay, free threading!)  We avoid the problem by only setting the global variables with the main interpreter (i.e. runtime init).
(cherry picked from commit 0ba07b2108)

Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
2023-07-21 22:27:52 +02:00
Miss Islington (bot)
ed038953fc
[3.12] gh-105375: Improve error handling in PyUnicode_BuildEncodingMap() (GH-105491) (#105661)
Bail on first error to prevent exceptions from possibly being overwritten.
(cherry picked from commit 555be81026)

Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
2023-06-11 20:01:18 +00:00
Miss Islington (bot)
5dc6b18cb0
Fix compiler warning in unicodeobject.c (GH-105050)
Fix compiler warning in unicodeobject.c (GH-105050)
(cherry picked from commit e92ac0a741)

Co-authored-by: Inada Naoki <songofacandy@gmail.com>
2023-05-29 18:05:42 +09:00
Serhiy Storchaka
f3466bc040
gh-98836: Extend PyUnicode_FromFormat() (GH-98838)
* Support for conversion specifiers o (octal) and X (uppercase hexadecimal).
* Support for length modifiers j (intmax_t) and t (ptrdiff_t).
* Length modifiers are now applied to all integer conversions.
* Support for wchar_t C strings (%ls and %lV).
* Support for variable width and precision (*).
* Support for flag - (left alignment).
2023-05-22 00:32:39 +03:00
John Belmonte
69621d1b09
gh-104018: remove unused format "z" handling in string formatfloat() (#104107)
This is a cleanup overlooked in PR #104033.
2023-05-07 10:11:42 +05:30
Eric Snow
a9c6e0618f
gh-99113: Add Py_MOD_PER_INTERPRETER_GIL_SUPPORTED (gh-104205)
Here we are doing no more than adding the value for Py_mod_multiple_interpreters and using it for stdlib modules.  We will start checking for it in gh-104206 (once PyInterpreterState.ceval.own_gil is added in gh-104204).
2023-05-05 21:11:27 +00:00
Eric Snow
fdd878650d
gh-94673: Properly Initialize and Finalize Static Builtin Types for Each Interpreter (gh-104072)
Until now, we haven't been initializing nor finalizing the per-interpreter state properly.
2023-05-01 19:36:00 -06:00
Eric Snow
d2e2e53f73
gh-94673: Ensure Builtin Static Types are Readied Properly (gh-103940)
There were cases where we do unnecessary work for builtin static types. This also simplifies some work necessary for a per-interpreter GIL.
2023-04-27 16:19:43 -06:00
Eddie Elizondo
ea2c001650
gh-84436: Implement Immortal Objects (gh-19474)
This is the implementation of PEP683

Motivation:

The PR introduces the ability to immortalize instances in CPython which bypasses reference counting. Tagging objects as immortal allows up to skip certain operations when we know that the object will be around for the entire execution of the runtime.

Note that this by itself will bring a performance regression to the runtime due to the extra reference count checks. However, this brings the ability of having truly immutable objects that are useful in other contexts such as immutable data sharing between sub-interpreters.
2023-04-22 13:39:37 -06:00
Eric Snow
ba65a065cf
gh-100227: Move the Dict of Interned Strings to PyInterpreterState (gh-102339)
We can revisit the options for keeping it global later, if desired.  For now the approach seems quite complex, so we've gone with the simpler isolation solution in the meantime.

https://github.com/python/cpython/issues/100227
2023-03-28 12:52:28 -06:00
Eric Snow
89e67ada69
gh-100227: Revert gh-102925 "gh-100227: Make the Global Interned Dict Safe for Isolated Interpreters" (gh-103063)
This reverts commit 87be8d9.

This approach to keeping the interned strings safe is turning out to be too complex for my taste (due to obmalloc isolation). For now I'm going with the simpler solution, making the dict per-interpreter. We can revisit that later if we want a sharing solution.
2023-03-27 16:53:05 -06:00
Eric Snow
87be8d9522
gh-100227: Make the Global Interned Dict Safe for Isolated Interpreters (gh-102925)
This is effectively two changes.  The first (the bulk of the change) is where we add _Py_AddToGlobalDict() (and _PyRuntime.cached_objects.main_tstate, etc.).  The second (much smaller) change is where we update PyUnicode_InternInPlace() to use _Py_AddToGlobalDict() instead of calling PyDict_SetDefault() directly.

Basically, _Py_AddToGlobalDict() is a wrapper around PyDict_SetDefault() that should be used whenever we need to add a value to a runtime-global dict object (in the few cases where we are leaving the container global rather than moving it to PyInterpreterState, e.g. the interned strings dict).  _Py_AddToGlobalDict() does all the necessary work to make sure the target global dict is shared safely between isolated interpreters.  This is especially important as we move the obmalloc state to each interpreter (gh-101660), as well as, potentially, the GIL (PEP 684).

https://github.com/python/cpython/issues/100227
2023-03-22 18:30:04 -06:00
Kumar Aditya
3d872a74c8
GH-100227: cleanup initialization of global interned dict (#102682) 2023-03-14 14:22:21 +05:30
Eric Snow
cbb0aa71d0
gh-102304: Consolidate Direct Usage of _Py_RefTotal (gh-102514)
This simplifies further changes to _Py_RefTotal (e.g. make it atomic or move it to PyInterpreterState).

https://github.com/python/cpython/issues/102304
2023-03-08 12:03:50 -07:00
Jelle Zijlstra
8d0f09b1be
gh-101765: unicodeobject: use Py_XDECREF correctly (#102283) 2023-02-26 14:45:37 -08:00
Jelle Zijlstra
d71edbd1b7
gh-101765: Fix refcount issues in list and unicode pickling (#102265)
Followup from #101769.
2023-02-25 16:01:58 -08:00
Ionite
54dfa14c5a
gh-101765: Fix SystemError / segmentation fault in iter __reduce__ when internal access of builtins.__dict__ exhausts the iterator (#101769) 2023-02-24 15:02:04 -08:00
Eric Snow
aa8591e9ca
gh-90111: Minor Cleanup for Runtime-Global Objects (gh-100254)
* move _PyRuntime.global_objects.interned to _PyRuntime.cached_objects.interned_strings (and use _Py_CACHED_OBJECT())
* rename _PyRuntime.global_objects to _PyRuntime.static_objects

(This also relates to gh-96075.)

https://github.com/python/cpython/issues/90111
2022-12-14 11:53:57 -07:00
Eric Snow
91a8e002c2
gh-81057: Move More Globals to _PyRuntimeState (gh-100092)
https://github.com/python/cpython/issues/81057
2022-12-07 15:56:31 -07:00
Serhiy Storchaka
a87c46eab3
bpo-15999: Accept arbitrary values for boolean parameters. (#15609)
builtins and extension module functions and methods that expect boolean values for parameters now accept any Python object rather than just a bool or int type. This is more consistent with how native Python code itself behaves.
2022-12-03 11:52:21 -08:00
Serhiy Storchaka
f08e52ccb0
gh-99612: Fix PyUnicode_DecodeUTF8Stateful() for ASCII-only data (GH-99613)
Previously *consumed was not set in this case.
2022-12-01 14:54:51 +02:00
Victor Stinner
135ec7cefb
gh-99537: Use Py_SETREF() function in C code (#99657)
Fix potential race condition in code patterns:

* Replace "Py_DECREF(var); var = new;" with "Py_SETREF(var, new);"
* Replace "Py_XDECREF(var); var = new;" with "Py_XSETREF(var, new);"
* Replace "Py_CLEAR(var); var = new;" with "Py_XSETREF(var, new);"

Other changes:

* Replace "old = var; var = new; Py_DECREF(var)"
  with "Py_SETREF(var, new);"
* Replace "old = var; var = new; Py_XDECREF(var)"
  with "Py_XSETREF(var, new);"
* And remove the "old" variable.
2022-11-22 13:39:11 +01:00
Eric Snow
5f55067e23
gh-81057: Move More Globals in Core Code to _PyRuntimeState (gh-99516)
https://github.com/python/cpython/issues/81057
2022-11-16 09:37:14 -07:00
Victor Stinner
1960eb005e
gh-99300: Use Py_NewRef() in Objects/ directory (#99351)
Replace Py_INCREF() and Py_XINCREF() with Py_NewRef() and
Py_XNewRef() in C files of the Objects/ directory.
2022-11-10 23:40:31 +01:00
Eric Snow
52f91c642b
gh-90868: Adjust the Generated Objects (gh-99223)
We do the following:

* move the generated _PyUnicode_InitStaticStrings() to its own file
* move the generated _PyStaticObjects_CheckRefcnt() to its own file
* include pycore_global_objects.h in extension modules instead of pycore_runtime_init.h

These changes help us avoid including things that aren't needed.

https://github.com/python/cpython/issues/90868
2022-11-08 10:03:03 -07:00
Nikita Sobolev
76f989dc3e
gh-98783: Fix crashes when str subclasses are used in _PyUnicode_Equal (#98806) 2022-10-30 02:23:20 -04:00
Victor Stinner
db03c8066a
gh-98393: os module reject bytes-like, only accept bytes (#98394)
The os module and the PyUnicode_FSDecoder() function no longer accept
bytes-like paths, like bytearray and memoryview types: only the exact
bytes type is accepted for bytes strings.
2022-10-18 17:52:31 +02:00
Nikita Sobolev
ccab67ba79
gh-97982: Factorize PyUnicode_Count() and unicode_count() code (#98025)
Add unicode_count_impl() to factorize PyUnicode_Count()
and unicode_count() code.
2022-10-12 18:27:53 +02:00
Victor Stinner
df3a6d9beb
gh-97982: Remove asciilib_count() (#98164)
asciilib_count() is the same than ucs1lib_count(): the code is not
specialized for ASCII strings, so it's not worth it to have a
separated function. Remove asciilib_count() function.
2022-10-11 17:59:58 +02:00
Kumar Aditya
6dab8c95bd
GH-96458: Statically initialize utf8 representation of static strings (#96481) 2022-09-02 23:43:08 -07:00
Kumar Aditya
129998bd7b
GH-96075: move interned dict under runtime state (GH-96077) 2022-08-22 12:05:21 -07:00
Petr Viktorin
71c3d649b5
gh-95504: Fix negative numbers in PyUnicode_FromFormat (GH-95848)
Co-authored-by: philg314 <110174000+philg314@users.noreply.github.com>
2022-08-10 13:12:40 +02:00
Serhiy Storchaka
62f06508e7
gh-95781: More strict format string checking in PyUnicode_FromFormatV() (GH-95784)
An unrecognized format character in PyUnicode_FromFormat() and
PyUnicode_FromFormatV() now sets a SystemError.
In previous versions it caused all the rest of the format string to be
copied as-is to the result string, and any extra arguments discarded.
2022-08-08 19:21:07 +03:00
Dong-hee Na
fb75d015f4
gh-91146: More reduce allocation size of list from str.split/rsplit (gh-95493)
Co-authored-by: Inada Naoki <songofacandy@gmail.com>
2022-08-01 22:15:07 +09:00