Don't use PyObject_Free() as tp_dealloc to avoid an undefined
behavior. Instead, use the default deallocator which just calls
tp_free which is PyObject_Free().
Add free-threaded versions of existing specialization for FOR_ITER (list, tuples, fast range iterators and generators), without significantly affecting their thread-safety. (Iterating over shared lists/tuples/ranges should be fine like before. Reusing iterators between threads is not fine, like before. Sharing generators between threads is a recipe for significant crashes, like before.)
The free-threading build interns and immortalizes most constants
generated by the bytecode compiler. However, users can construct their
own code objects with arbitrary constants. We should not intern or
immortalize these objects if they are not of a type that we know how to
handle.
This change fixes a reference leak failure in the recently added
`test_code.test_unusual_constants` test. It also addresses a potential
crash that could occur when attempting to destroy an immortalized
object during interpreter shutdown.
I chose to not raise an exception here because I think it would be
confusing for module attribute access to start raising something other
than AttributeError if e.g. the cwd goes away
Without the change in moduleobject.c
```
./python.exe -m unittest test.test_import.ImportTests.test_script_shadowing_stdlib_cwd_failure
...
Assertion failed: (PyErr_Occurred()), function _PyObject_SetAttributeErrorContext, file object.c, line 1253.
```
* Fix use after free in list objects
Set the items pointer in the list object to NULL after the items array
is freed during list deallocation. Otherwise, we can end up with a list
object added to the free list that contains a pointer to an already-freed
items array.
* Mark `_PyList_FromStackRefStealOnSuccess` as escaping
I think technically it's not escaping, because the only object that
can be decrefed if allocation fails is an exact list, which cannot
execute arbitrary code when it is destroyed. However, this seems less
intrusive than trying to special cases objects in the assert in `_Py_Dealloc`
that checks for non-null stackpointers and shouldn't matter for performance.
The bytecode compiler only generates a few different types of constants,
like str, int, tuple, slices, etc. Users can construct code objects with
various unusual constants, including ones that are not hashable or not
even constant.
The free threaded build previously crashed with a fatal error when
confronted with these constants. Instead, treat distinct objects of
otherwise unhandled types as not equal for the purposes of deduplication.
* Add location information when accessing already closed stackref
* Add #def option to track closed stackrefs to provide precise information for use after free and double frees.
Avoid a data race in free-threaded builds due to mutating global arrays at
runtime. Instead, compute the constants with an external Python script and
then define them as static global constant arrays. These constants are
used by `long_from_non_binary_base()`.
Remove inclusions prior to Python.h.
<stdbool.h> will cause <features.h> to be included before Python.h can
define some macros to enable some additional features, causing multiple
types not to be defined down the line.
The `free_work_item()` function in QSBR may call arbitrary code via
Python object destructors, which may reenter the QSBR code. Reorder
the processing of work items to be robust to reentrancy.
Also fix the TODO for the out of memory situation.
The `PyType_HasFeature()` function reads the flags with a relaxed atomic
load and without holding the type lock. To avoid data races, use atomic
stores if `PyType_Ready()` has already been called.
The use of PySys_GetObject() and _PySys_GetAttr(), which return a borrowed
reference, has been replaced by using one of the following functions, which
return a strong reference and distinguish a missing attribute from an error:
_PySys_GetOptionalAttr(), _PySys_GetOptionalAttrString(),
_PySys_GetRequiredAttr(), and _PySys_GetRequiredAttrString().
This fixes a fairly subtle bug involving finalizers and resurrection in
debug free threaded builds: if `_PyObject_ResurrectEnd` returns `1`
(i.e., the object was resurrected by a finalizer), it's not safe to
access the object because it might still be deallocated. For example:
* The finalizer may have exposed the object to another thread. That
thread may hold the last reference and concurrently deallocate it any
time after `_PyObject_ResurrectEnd()` returns `1`.
* `_PyObject_ResurrectEnd()` may call `_Py_brc_queue_object()`, which
may internally deallocate the object immediately if the owning thread
is dead.
Therefore, it's important not to access the object after it's
resurrected. We only violate this in two cases, and only in debug
builds:
* We assert that the object is tracked appropriately. This is now moved
up betewen the finalizer and the `_PyObject_ResurrectEnd()` call.
* The `--with-trace-refs` builds may need to remember the object if
it's resurrected. This is now handled by `_PyObject_ResurrectStart()`
and `_PyObject_ResurrectEnd()`.
Note that `--with-trace-refs` is currently disabled in `--disable-gil`
builds because the refchain hash table isn't thread-safe, but this
refactoring avoids an additional thread-safety issue.
Fix UBSan failures for `typealiasobject`, `paramspecobject`, `typevarobject`, `typevartupleobject`, `paramspecattrobject`
Use _PyCFunction_CAST macros
Use macro for `constevaluatorobject` casts
Fix UBSan failures for `PyTypeObject`.
Introduce a macro cast for `superobject` and remove redundant casts.
Rename the unused parameter in getter/setter methods to `closure`
for semantic purposes.
* Implement C recursion protection with limit pointers for Linux, MacOS and Windows
* Remove calls to PyOS_CheckStack
* Add stack protection to parser
* Make tests more robust to low stacks
* Improve error messages for stack overflow
Revert "GH-91079: Implement C stack limits using addresses, not counters. (GH-130007)" for now
Unfortunatlely, the change broke some buildbots.
This reverts commit 2498c22fa0.
CPython current temporarily changes `PYMEM_DOMAIN_RAW` to the default
allocator during initialization and shutdown. The motivation is to
ensure that core runtime structures are allocated and freed using the
same allocator. However, modifying the current allocator changes global
state and is not thread-safe even with the GIL. Other threads may be
allocating or freeing objects use PYMEM_DOMAIN_RAW; they are not
required to hold the GIL to call PyMem_RawMalloc/PyMem_RawFree.
This adds new internal-only functions like `_PyMem_DefaultRawMalloc`
that aren't affected by calls to `PyMem_SetAllocator()`, so they're
appropriate for Python runtime initialization and finalization. Use
these calls in places where we previously swapped to the default raw
allocator.
* Implement C recursion protection with limit pointers
* Remove calls to PyOS_CheckStack
* Add stack protection to parser
* Make tests more robust to low stacks
* Improve error messages for stack overflow
Make tuple iteration more thread-safe, and actually test concurrent iteration of tuple, range and list. (This is prep work for enabling specialization of FOR_ITER in free-threaded builds.) The basic premise is:
Iterating over a shared iterable (list, tuple or range) should be safe, not involve data races, and behave like iteration normally does.
Using a shared iterator should not crash or involve data races, and should only produce items regular iteration would produce. It is not guaranteed to produce all items, or produce each item only once. (This is not the case for range iteration even after this PR.)
Providing stronger guarantees is possible for some of these iterators, but it's not always straight-forward and can significantly hamper the common case. Since iterators in general aren't shared between threads, and it's simply impossible to concurrently use many iterators (like generators), better to make sharing iterators without explicit synchronization clearly wrong.
Specific issues fixed in order to make the tests pass:
- List iteration could occasionally fail an assertion when a shared list was shrunk and an item past the new end was retrieved concurrently. There's still some unsafety when deleting/inserting multiple items through for example slice assignment, which uses memmove/memcpy.
- Tuple iteration could occasionally crash when the iterator's reference to the tuple was cleared on exhaustion. Like with list iteration, in free-threaded builds we can't safely and efficiently clear the iterator's reference to the iterable (doing it safely would mean extra, slow refcount operations), so just keep the iterable reference around.
* gh-129701: Fix a data race in `intern_common` in the free threaded build
* Use a mutex to avoid potentially returning a non-immortalized string,
because immortalization happens after the insertion into the interned
dict.
* Use `Py_DECREF()` calls instead of `Py_SET_REFCNT(s, Py_REFCNT(s) - 2)`
for thread-safety. This code path isn't performance sensistive, so
just use `Py_DECREF()` unconditionally for simplicity.