Previously, we assumed that instrumentation would happen for all copies of
the bytecode if the instrumentation version on the code object didn't match
the per-interpreter instrumentation version. That assumption was incorrect:
instrumentation will exit early if there are no new "events," even if there
is an instrumentation version mismatch.
To fix this, include the instrumented opcodes when creating new copies of
the bytecode, rather than replacing them with their uninstrumented variants.
I don't think we have to worry about races between instrumentation and creating
new copies of the bytecode: instrumentation and new bytecode creation cannot happen
concurrently. Instrumentation requires that either the world is stopped or the
code object's per-object lock is held and new bytecode creation requires holding
the code object's per-object lock.
(cherry picked from commit d995922198)
Co-authored-by: mpage <mpage@meta.com>
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
gh-127971: fix off-by-one read beyond the end of a string during search (GH-132574)
(cherry picked from commit 85ec3b3b50)
Co-authored-by: Duane Griffin <duaneg@dghda.com>
gh-91153: prevent a crash in `bytearray.__setitem__(ind, ...)` when `ind.__index__` has side-effects (GH-132379)
(cherry picked from commit 5e1e21dee3)
Co-authored-by: Bast <52266665+bast0006@users.noreply.github.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
The free threading build uses QSBR to delay the freeing of dictionary
keys and list arrays when the objects are accessed by multiple threads
in order to allow concurrent reads to proceed with holding the object
lock. The requests are processed in batches to reduce execution
overhead, but for large memory blocks this can lead to excess memory
usage.
Take into account the size of the memory block when deciding when to
process QSBR requests.
Also track the amount of memory being held by QSBR for mimalloc pages. Advance the write sequence if this memory exceeds a limit. Advancing the sequence will allow it to be freed more quickly.
Process the held QSBR items from the "eval breaker", rather than from `_PyMem_FreeDelayed()`. This gives a higher chance that the global read sequence has advanced enough so that items can be freed.
(cherry picked from commit 113de8545f)
Co-authored-by: Neil Schemenauer <nas-github@arctrix.com>
Co-authored-by: Sam Gross <colesbury@gmail.com>
gh-129824: fix data races in subinterpreters under TSAN (GH-135794)
This fixes the data races in typeobject.c in subinterpreters under free-threading. The type flags and slots are only modified in the main interpreter as all static types are first initialised in main interpreter.
(cherry picked from commit b582d751b4)
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
gh-135607: remove null checking of weakref list in dealloc of extension modules and objects (#135614)
(cherry picked from commit b1056c2a44)
Co-authored-by: Xuanteng Huang <44627253+xuantengh@users.noreply.github.com>
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
Previous error message suggested to use cls.__new__(), which
obviously does not work. Now the error message is the same as for
cls(...).
(cherry picked from commit c45f4f3ebe)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
The assertion reflected a misunderstanding of situations where "hidden" variables might exist,
namely generator expressions and comprehensions.
(cherry picked from commit 15f2bac02c, AKA gh-135466)
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
For several builtin functions, we now fall back to __main__.__dict__ for the globals
when there is no current frame and _PyInterpreterState_IsRunningMain() returns
true. This allows those functions to be run with Interpreter.call().
The affected builtins:
* exec()
* eval()
* globals()
* locals()
* vars()
* dir()
We take a similar approach with "stateless" functions, which don't use any
global variables.
(cherry picked from commit a450a0ddec, AKA gh-135491)
Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
Use `ma_used` instead of `ma_keys->dk_nentries` for modification check
so that we only check if the dictionary is modified, not if new keys are
added to a different dictionary that shared the same keys object.
(cherry picked from commit d8994b0a77)
Co-authored-by: Sam Gross <colesbury@gmail.com>
gh-133968: Add PyUnicodeWriter_WriteASCII() function (#133973)
Replace most PyUnicodeWriter_WriteUTF8() calls with
PyUnicodeWriter_WriteASCII().
(cherry picked from commit f49a07b531)
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
gh-133489: Remove size restrictions on getrandbits() and randbytes() (GH-133658)
random.getrandbits() can now generate more that 2**31 bits.
random.randbytes() can now generate more that 256 MiB.
(cherry picked from commit 68784fed78)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
gh-132775: Fix _PyFunctIon_VerifyStateless() ()
The problem we're fixing here is that we were using PyDict_Size() on "defaults",
which it is actually a tuple. We're also adding some explicit type checks.
This is a follow-up to gh-133221/gh-133528.
(cherry picked from commit dafd14146f, AKA gh-134900)
Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
This is the same underlying bug as gh-130519. The destructor may call
arbitrary code, changing the `tstate->qsbr pointer` and invalidating the
old `struct _qsbr_thread_state`.
(cherry picked from commit a4d37f88b6)
Co-authored-by: Sam Gross <colesbury@gmail.com>
gh-91048: Refactor and optimize remote debugging module (#134652)
Completely refactor Modules/_remote_debugging_module.c with improved
code organization, replacing scattered reference counting and error
handling with centralized goto error paths. This cleanup improves
maintainability and reduces code duplication throughout the module while
preserving the same external API.
Implement memory page caching optimization in Python/remote_debug.h to
avoid repeated reads of the same memory regions during debugging
operations. The cache stores previously read memory pages and reuses
them for subsequent reads, significantly reducing system calls and
improving performance.
Add code object caching mechanism with a new code_object_generation
field in the interpreter state that tracks when code object caches need
invalidation. This allows efficient reuse of parsed code object metadata
and eliminates redundant processing of the same code objects across
debugging sessions.
Optimize memory operations by replacing multiple individual structure
copies with single bulk reads for the same data structures. This reduces
the number of memory operations and system calls required to gather
debugging information from the target process.
Update Makefile.pre.in to include Python/remote_debug.h in the headers
list, ensuring that changes to the remote debugging header force proper
recompilation of dependent modules and maintain build consistency across
the codebase.
Also, make the module compatible with the free threading build as an extra :)
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
(cherry picked from commit 42b25ad4d3)
gh-133980: use atomic store in `PyObject_GenericSetDict` (GH-133988)
(cherry picked from commit ec39fd2c20)
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
gh-132641: fix race in `lru_cache` under free-threading (GH-133787)
Fix race in `lru_cache` by acquiring critical section on the cache object itself and call the lock held variant of dict functions to modify the underlying dict.
(cherry picked from commit 9ad0c7b0f1)
Co-authored-by: Peter Hawkins <phawkins@google.com>
gh-133968: Add fast path to PyUnicodeWriter_WriteStr() (GH-133969)
Don't call PyObject_Str() if the input type is str.
(cherry picked from commit fe9f6e829a)
Co-authored-by: Victor Stinner <vstinner@python.org>
If the error handler is used, a new bytes object is created to set as
the object attribute of UnicodeDecodeError, and that bytes object then
replaces the original data. A pointer to the decoded data will became invalid
after destroying that temporary bytes object. So we need other way to return
the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().
_PyBytes_DecodeEscape() does not have such issue, because it does not
use the error handlers registry, but it should be changed for compatibility
with _PyUnicode_DecodeUnicodeEscapeInternal().
(cherry picked from commit 9f69a58623)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
The function `dict_set_fromkeys()` adds elements of a set to an existing
dictionary. The size of the expanded dictionary was estimated with
`PySet_GET_SIZE(iterable)`, which did not take into account the size of the
existing dictionary.
(cherry picked from commit 421ba589d0)
Co-authored-by: Angela Liss <59097311+angela-tarantula@users.noreply.github.com>
This reverts commit 3c73cf5 (gh-133497), which itself reverted
the original commit d270bb5 (gh-133221).
We reverted the original change due to failing android tests.
The checks in _PyCode_CheckNoInternalState() were too strict,
so we've relaxed them.
"Stateless" code is a function or code object which does not rely on external state or internal state.
It may rely on arguments and builtins, but not globals or a closure. I've left a comment in
pycore_code.h that provides more detail.
We also add _PyFunction_VerifyStateless(). The new functions will be used in several later changes
that facilitate "sharing" functions and code objects between interpreters.
This reverts commit 811edcf (gh-133232), which itself reverted the original commit 811edcf (gh-133128).
We reverted the original change due to failing s390 builds (a big-endian architecture).
It ended up that I had not accommodated op caches.