gh-115999: Enable specialization of CALL instructions in free-threaded builds (#127123)

The CALL family of instructions were mostly thread-safe already and only required a small number of changes, which are documented below.

A few changes were needed to make CALL_ALLOC_AND_ENTER_INIT thread-safe:

Added _PyType_LookupRefAndVersion, which returns the type version corresponding to the returned ref.

Added _PyType_CacheInitForSpecialization, which takes an init method and the corresponding type version and only populates the specialization cache if the current type version matches the supplied version. This prevents potentially caching a stale value in free-threaded builds if we race with an update to __init__.

Only cache __init__ functions that are deferred in free-threaded builds. This ensures that the reference to __init__ that is stored in the specialization cache is valid if the type version guard in _CHECK_AND_ALLOCATE_OBJECT passes.
Fix a bug in _CREATE_INIT_FRAME where the frame is pushed to the stack on failure.

A few other miscellaneous changes were also needed:

Use {LOCK,UNLOCK}_OBJECT in LIST_APPEND. This ensures that the list's per-object lock is held while we are appending to it.

Add missing co_tlbc for _Py_InitCleanup.

Stop/start the world around setting the eval frame hook. This allows us to read interp->eval_frame non-atomically and preserves the behavior of _CHECK_PEP_523 documented below.
This commit is contained in:
mpage 2024-12-03 11:20:20 -08:00 committed by GitHub
parent fc5a0dc224
commit dabcecfd6d
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
11 changed files with 220 additions and 92 deletions

View file

@ -484,11 +484,11 @@ _PyPerfTrampoline_Init(int activate)
return -1;
}
if (!activate) {
tstate->interp->eval_frame = NULL;
_PyInterpreterState_SetEvalFrameFunc(tstate->interp, NULL);
perf_status = PERF_STATUS_NO_INIT;
}
else {
tstate->interp->eval_frame = py_trampoline_evaluator;
_PyInterpreterState_SetEvalFrameFunc(tstate->interp, py_trampoline_evaluator);
if (new_code_arena() < 0) {
return -1;
}
@ -514,7 +514,7 @@ _PyPerfTrampoline_Fini(void)
}
PyThreadState *tstate = _PyThreadState_GET();
if (tstate->interp->eval_frame == py_trampoline_evaluator) {
tstate->interp->eval_frame = NULL;
_PyInterpreterState_SetEvalFrameFunc(tstate->interp, NULL);
}
if (perf_status == PERF_STATUS_OK) {
trampoline_api.free_state(trampoline_api.state);