There is a WIP proposal to enable webassembly stack switching which have been
implemented in v8:
https://github.com/WebAssembly/js-promise-integration
It is not possible to switch stacks that contain JS frames so the Emscripten JS
trampolines that allow calling functions with the wrong number of arguments
don't work in this case. However, the js-promise-integration proposal requires
the [type reflection for Wasm/JS API](https://github.com/WebAssembly/js-types)
proposal, which allows us to actually count the number of arguments a function
expects.
For better compatibility with stack switching, this PR checks if type reflection
is available, and if so we use a switch block to decide the appropriate
signature. If type reflection is unavailable, we should use the current EMJS
trampoline.
We cache the function argument counts since when I didn't cache them performance
was negatively affected.
Co-authored-by: T. Wouters <thomas@python.org>
Co-authored-by: Brett Cannon <brett@python.org>
No longer export _PyUnicode_FromId() internal C API function.
Change comment style to "// comment" and add comment explaining why
other functions have to be exported.
Update Tools/build/generate_token.py to update Include/internal/pycore_token.h
comments.
* Add missing includes.
* Remove unused includes.
* Update old include/symbol names to newer names.
* Mention at least one included symbol.
* Sort includes.
* Update Tools/cases_generator/generate_cases.py used to generated
pycore_opcode_metadata.h.
* Update Parser/asdl_c.py used to generate pycore_ast.h.
* Cleanup also includes in _testcapimodule.c and _testinternalcapi.c.
We tried this before with a dict and for all interned strings. That ran into problems due to interpreter isolation. However, exclusively using a per-interpreter cache caused some inconsistency that can eliminate the benefit of interning. Here we circle back to using a global cache, but only for statically allocated strings. We also use a more-basic _Py_hashtable_t for that global cache instead of a dict.
Ideally we would only have the global cache, but the optional isolation of each interpreter's allocator means that a non-static string object must not outlive its interpreter. Thus we would have to store a copy of each such interned string in the global cache, tied to the main interpreter.
This is the culmination of PEP 684 (and of my 8-year long multi-core Python project)!
Each subinterpreter may now be created with its own GIL (via Py_NewInterpreterFromConfig()). If not so configured then the interpreter will share with the main interpreter--the status quo since subinterpreters were added decades ago. The main interpreter always has its own GIL and subinterpreters from Py_NewInterpreter() will always share with the main interpreter.
This is strictly about moving the "obmalloc" runtime state from
`_PyRuntimeState` to `PyInterpreterState`. Doing so improves isolation
between interpreters, specifically most of the memory (incl. objects)
allocated for each interpreter's use. This is important for a
per-interpreter GIL, but such isolation is valuable even without it.
FWIW, a per-interpreter obmalloc is the proverbial
canary-in-the-coalmine when it comes to the isolation of objects between
interpreters. Any object that leaks (unintentionally) to another
interpreter is highly likely to cause a crash (on debug builds at
least). That's a useful thing to know, relative to interpreter
isolation.
Core static types will continue to use the global value. All other types
will use the per-interpreter value. They all share the same range, where
the global types use values < 2^16 and each interpreter uses values
higher than that.
We replace _PyRuntime.tstate_current with a thread-local variable. As part of this change, we add a _Py_thread_local macro in pyport.h (only for the core runtime) to smooth out the compiler differences. The main motivation here is in support of a per-interpreter GIL, but this change also provides some performance improvement opportunities.
Note that we do not provide a fallback to the thread-local, either falling back to the old tstate_current or to thread-specific storage (PyThread_tss_*()). If that proves problematic then we can circle back. I consider it unlikely, but will run the buildbots to double-check.
Also note that this does not change any of the code related to the GILState API, where it uses a thread state stored in thread-specific storage. I suspect we can combine that with _Py_tss_tstate (from here). However, that can be addressed separately and is not urgent (nor critical).
(While this change was mostly done independently, I did take some inspiration from earlier (~2020) work by @markshannon (main...markshannon:threadstate_in_tls) and @vstinner (#23976).)
The function is like Py_AtExit() but for a single interpreter. This is a companion to the atexit module's register() function, taking a C callback instead of a Python one.
We also update the _xxinterpchannels module to use _Py_AtExit(), which is the motivating case. (This is inspired by pain points felt while working on gh-101660.)
We can revisit the options for keeping it global later, if desired. For now the approach seems quite complex, so we've gone with the simpler isolation solution in the meantime.
https://github.com/python/cpython/issues/100227
The essentially eliminates the global variable, with the associated benefits. This is also a precursor to isolating this bit of state to PyInterpreterState.
Folks that currently read _Py_RefTotal directly would have to start using _Py_GetGlobalRefTotal() instead.
https://github.com/python/cpython/issues/102304
We've factored out a struct from the two PyThreadState fields. This accomplishes two things:
* make it clear that the trashcan-related code doesn't need any other parts of PyThreadState
* allows us to use the trashcan mechanism even when there isn't a "current" thread state
We still expect the caller to hold the GIL.
https://github.com/python/cpython/issues/59956
The objective of this change is to help make the GILState-related code easier to understand. This mostly involves moving code around and some semantically equivalent refactors. However, there are a also a small number of slight changes in structure and behavior:
* tstate_current is moved out of _PyRuntimeState.gilstate
* autoTSSkey is moved out of _PyRuntimeState.gilstate
* autoTSSkey is initialized earlier
* autoTSSkey is re-initialized (after fork) earlier
https://github.com/python/cpython/issues/59956
* move _PyRuntime.global_objects.interned to _PyRuntime.cached_objects.interned_strings (and use _Py_CACHED_OBJECT())
* rename _PyRuntime.global_objects to _PyRuntime.static_objects
(This also relates to gh-96075.)
https://github.com/python/cpython/issues/90111
The global allocators were stored in 3 static global variables: _PyMem_Raw, _PyMem, and _PyObject. State for the "small block" allocator was stored in another 13. That makes a total of 16 global variables. We are moving all 16 to the _PyRuntimeState struct as part of the work for gh-81057. (If PEP 684 is accepted then we will follow up by moving them all to PyInterpreterState.)
https://github.com/python/cpython/issues/81057
This was added for bpo-40514 (gh-84694) to test out a per-interpreter GIL. However, it has since proven unnecessary to keep the experiment in the repo. (It can be done as a branch in a fork like normal.) So here we are removing:
* the configure option
* the macro
* the code enabled by the macro
Previously, the main interpreter was allocated on the heap during runtime initialization. Here we instead embed it into _PyRuntimeState, which means it is statically allocated as part of the _PyRuntime global. The same goes for the initial thread state (of each interpreter, including the main one). Consequently there are fewer allocations during runtime/interpreter init, fewer possible failures, and better memory locality.
FYI, this also helps efforts to consolidate globals, which in turns helps work on subinterpreter isolation.
https://bugs.python.org/issue45953
The array of small PyLong objects has been statically declared. Here I also statically initialize them. Consequently they are no longer initialized dynamically during runtime init.
I've also moved them under a new sub-struct in _PyRuntimeState, in preparation for static allocation and initialization of other global objects.
https://bugs.python.org/issue45953
This change is strictly renames and moving code around. It helps in the following ways:
* ensures type-related init functions focus strictly on one of the three aspects (state, objects, types)
* passes in PyInterpreterState * to all those functions, simplifying work on moving types/objects/state to the interpreter
* consistent naming conventions help make what's going on more clear
* keeping API related to a type in the corresponding header file makes it more obvious where to look for it
https://bugs.python.org/issue46008
Previously, basic initialization of PyInterprterState happened in PyInterpreterState_New() (along with allocation and adding the new interpreter to the runtime state). This prevented us from initializing interpreter states that were allocated separately (e.g. statically or in a free list). We've addressed that here by factoring out a separate function just for initialization. We've done the same for PyThreadState. _PyRuntimeState was sorted out when we added it since _PyRuntime is statically allocated. However, here we update the existing init code to line up with the functions for PyInterpreterState and PyThreadState.
https://bugs.python.org/issue46008
Fix the _PyUnicode_FromId() function (_Py_IDENTIFIER(var) API) when
Py_Initialize() / Py_Finalize() is called multiple times:
preserve _PyRuntime.unicode_ids.next_index value.
Use _PyRuntimeState_INIT macro instead memset(0) to reset
_PyRuntimeState members to zero.