* Remove all 'if (0)' and 'if (1)' conditional stack effects
* Use array instead of conditional for BUILD_SLICE args
* Refactor LOAD_GLOBAL to use a common conditional uop
* Remove conditional stack effects from LOAD_ATTR specializations
* Replace conditional stack effects in LOAD_ATTR with a 0 or 1 sized array.
* Remove conditional stack effects from CALL_FUNCTION_EX
* Add `_PyDictKeys_StringLookupSplit` which does locking on dict keys and
use in place of `_PyDictKeys_StringLookup`.
* Change `_PyObject_TryGetInstanceAttribute` to use that function
in the case of split keys.
* Add `unicodekeys_lookup_split` helper which allows code sharing
between `_Py_dict_lookup` and `_PyDictKeys_StringLookupSplit`.
* Fix locking for `STORE_ATTR_INSTANCE_VALUE`. Create
`_GUARD_TYPE_VERSION_AND_LOCK` uop so that object stays locked and
`tp_version_tag` cannot change.
* Pass `tp_version_tag` to `specialize_dict_access()`, ensuring
the version we store on the cache is the correct one (in case of
it changing during the specalize analysis).
* Split `analyze_descriptor` into `analyze_descriptor_load` and
`analyze_descriptor_store` since those don't share much logic.
Add `descriptor_is_class` helper function.
* In `specialize_dict_access`, double check `_PyObject_GetManagedDict()`
in case we race and dict was materialized before the lock.
* Avoid borrowed references in `_Py_Specialize_StoreAttr()`.
* Use `specialize()` and `unspecialize()` helpers.
* Add unit tests to ensure specializing happens as expected in FT builds.
* Add unit tests to attempt to trigger data races (useful for running under TSAN).
* Add `has_split_table` function to `_testinternalcapi`.
We use the same approach that was used for specialization of LOAD_GLOBAL in free-threaded builds:
_CHECK_ATTR_MODULE is renamed to _CHECK_ATTR_MODULE_PUSH_KEYS; it pushes the keys object for the following _LOAD_ATTR_MODULE_FROM_KEYS (nee _LOAD_ATTR_MODULE). This arrangement avoids having to recheck the keys version.
_LOAD_ATTR_MODULE is renamed to _LOAD_ATTR_MODULE_FROM_KEYS; it loads the value from the keys object pushed by the preceding _CHECK_ATTR_MODULE_PUSH_KEYS at the cached index.
Add free-threaded specialization for `UNPACK_SEQUENCE` opcode.
`UNPACK_SEQUENCE_TUPLE/UNPACK_SEQUENCE_TWO_TUPLE` are already thread safe since tuples are immutable.
`UNPACK_SEQUENCE_LIST` is not thread safe because of nature of lists (there is nothing preventing another thread from adding items to or removing them the list while the instruction is executing). To achieve thread safety we add a critical section to the implementation of `UNPACK_SEQUENCE_LIST`, especially around the parts where we check the size of the list and push items onto the stack.
---------
Co-authored-by: Matt Page <mpage@meta.com>
Co-authored-by: mpage <mpage@cs.stanford.edu>
Each of the `LOAD_GLOBAL` specializations is implemented roughly as:
1. Load keys version.
2. Load cached keys version.
3. Deopt if (1) and (2) don't match.
4. Load keys.
5. Load cached index into keys.
6. Load object from (4) at offset from (5).
This is not thread-safe in free-threaded builds; the keys object may be replaced
in between steps (3) and (4).
This change refactors the specializations to avoid reloading the keys object and
instead pass the keys object from guards to be consumed by downstream uops.
* Spill the evaluation around escaping calls in the generated interpreter and JIT.
* The code generator tracks live, cached values so they can be saved to memory when needed.
* Spills the stack pointer around escaping calls, so that the exact stack is visible to the cycle GC.
Support non-dict globals in LOAD_FROM_DICT_OR_GLOBALS
The implementation basically copies LOAD_GLOBAL. Possibly it could be deduplicated,
but that seems like it may get hairy since the two operations have different operands.
This is important to fix in 3.14 for PEP 649, but it's a bug in earlier versions too,
and we should backport to 3.13 and 3.12 if possible.