gh-115999: Enable specialization of CALL instructions in free-threaded builds (#127123)

The CALL family of instructions were mostly thread-safe already and only required a small number of changes, which are documented below. A few changes were needed to make CALL_ALLOC_AND_ENTER_INIT thread-safe: Added _PyType_LookupRefAndVersion, which returns the type version corresponding to the returned ref. Added _PyType_CacheInitForSpecialization, which takes an init method and the corresponding type version and only populates the specialization cache if the current type version matches the supplied version. This prevents potentially caching a stale value in free-threaded builds if we race with an update to __init__. Only cache __init__ functions that are deferred in free-threaded builds. This ensures that the reference to __init__ that is stored in the specialization cache is valid if the type version guard in _CHECK_AND_ALLOCATE_OBJECT passes. Fix a bug in _CREATE_INIT_FRAME where the frame is pushed to the stack on failure. A few other miscellaneous changes were also needed: Use {LOCK,UNLOCK}_OBJECT in LIST_APPEND. This ensures that the list's per-object lock is held while we are appending to it. Add missing co_tlbc for _Py_InitCleanup. Stop/start the world around setting the eval frame hook. This allows us to read interp->eval_frame non-atomically and preserves the behavior of _CHECK_PEP_523 documented below.
2025-10-17 04:08:28 +00:00 · 2024-12-03 11:20:20 -08:00 · 2024-12-03 11:20:20 -08:00 · dabcecfd6d
commit dabcecfd6d
parent fc5a0dc224
11 changed files with 220 additions and 92 deletions
--- a/Python/perf_trampoline.c
+++ b/Python/perf_trampoline.c
@ -484,11 +484,11 @@ _PyPerfTrampoline_Init(int activate)
        return -1;
    }
    if (!activate) {
-        tstate->interp->eval_frame = NULL;
+        _PyInterpreterState_SetEvalFrameFunc(tstate->interp, NULL);
        perf_status = PERF_STATUS_NO_INIT;
    }
    else {
-        tstate->interp->eval_frame = py_trampoline_evaluator;
+        _PyInterpreterState_SetEvalFrameFunc(tstate->interp, py_trampoline_evaluator);
        if (new_code_arena() < 0) {
            return -1;
        }
@ -514,7 +514,7 @@ _PyPerfTrampoline_Fini(void)
    }
    PyThreadState *tstate = _PyThreadState_GET();
    if (tstate->interp->eval_frame == py_trampoline_evaluator) {
-        tstate->interp->eval_frame = NULL;
+        _PyInterpreterState_SetEvalFrameFunc(tstate->interp, NULL);
    }
    if (perf_status == PERF_STATUS_OK) {
        trampoline_api.free_state(trampoline_api.state);