gh-103323: Get the "Current" Thread State from a Thread-Local Variable (gh-103324)

We replace _PyRuntime.tstate_current with a thread-local variable. As part of this change, we add a _Py_thread_local macro in pyport.h (only for the core runtime) to smooth out the compiler differences. The main motivation here is in support of a per-interpreter GIL, but this change also provides some performance improvement opportunities.

Note that we do not provide a fallback to the thread-local, either falling back to the old tstate_current or to thread-specific storage (PyThread_tss_*()). If that proves problematic then we can circle back. I consider it unlikely, but will run the buildbots to double-check.

Also note that this does not change any of the code related to the GILState API, where it uses a thread state stored in thread-specific storage. I suspect we can combine that with _Py_tss_tstate (from here). However, that can be addressed separately and is not urgent (nor critical).

(While this change was mostly done independently, I did take some inspiration from earlier (~2020) work by @markshannon (main...markshannon:threadstate_in_tls) and @vstinner (#23976).)
This commit is contained in:
Eric Snow 2023-04-24 11:17:02 -06:00 committed by GitHub
parent 7ef614c1ad
commit f8abfa3314
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
5 changed files with 73 additions and 18 deletions

View file

@ -60,23 +60,43 @@ extern "C" {
For each of these functions, the GIL must be held by the current thread.
*/
#ifdef HAVE_THREAD_LOCAL
_Py_thread_local PyThreadState *_Py_tss_tstate = NULL;
#endif
static inline PyThreadState *
current_fast_get(_PyRuntimeState *runtime)
current_fast_get(_PyRuntimeState *Py_UNUSED(runtime))
{
return (PyThreadState*)_Py_atomic_load_relaxed(&runtime->tstate_current);
#ifdef HAVE_THREAD_LOCAL
return _Py_tss_tstate;
#else
// XXX Fall back to the PyThread_tss_*() API.
# error "no supported thread-local variable storage classifier"
#endif
}
static inline void
current_fast_set(_PyRuntimeState *runtime, PyThreadState *tstate)
current_fast_set(_PyRuntimeState *Py_UNUSED(runtime), PyThreadState *tstate)
{
assert(tstate != NULL);
_Py_atomic_store_relaxed(&runtime->tstate_current, (uintptr_t)tstate);
#ifdef HAVE_THREAD_LOCAL
_Py_tss_tstate = tstate;
#else
// XXX Fall back to the PyThread_tss_*() API.
# error "no supported thread-local variable storage classifier"
#endif
}
static inline void
current_fast_clear(_PyRuntimeState *runtime)
current_fast_clear(_PyRuntimeState *Py_UNUSED(runtime))
{
_Py_atomic_store_relaxed(&runtime->tstate_current, (uintptr_t)NULL);
#ifdef HAVE_THREAD_LOCAL
_Py_tss_tstate = NULL;
#else
// XXX Fall back to the PyThread_tss_*() API.
# error "no supported thread-local variable storage classifier"
#endif
}
#define tstate_verify_not_active(tstate) \
@ -84,6 +104,12 @@ current_fast_clear(_PyRuntimeState *runtime)
_Py_FatalErrorFormat(__func__, "tstate %p is still current", tstate); \
}
PyThreadState *
_PyThreadState_GetCurrent(void)
{
return current_fast_get(&_PyRuntime);
}
//------------------------------------------------
// the thread state bound to the current OS thread