cpython/Include/internal
Ken Jin 4fa80ce74c
gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310)
This PR changes the current JIT model from trace projection to trace recording. Benchmarking: better pyperformance (about 1.7% overall) geomean versus current https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251108-3.15.0a1%2B-7e2bc1d-JIT/bm-20251108-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-7e2bc1d-vs-base.svg, 100% faster Richards on the most improved benchmark versus the current JIT. Slowdown of about 10-15% on the worst benchmark versus the current JIT. **Note: the fastest version isn't the one merged, as it relies on fixing bugs in the specializing interpreter, which is left to another PR**. The speedup in the merged version is about 1.1%. https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251112-3.15.0a1%2B-f8a764a-JIT/bm-20251112-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-f8a764a-vs-base.svg

Stats: 50% more uops executed, 30% more traces entered the last time we ran them. It also suggests our trace lengths for a real trace recording JIT are too short, as a lot of trace too long aborts https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251023-3.15.0a1%2B-eb73378-CLANG%2CJIT/bm-20251023-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-eb73378-pystats-vs-base.md .

This new JIT frontend is already able to record/execute significantly more instructions than the previous JIT frontend. In this PR, we are now able to record through custom dunders, simple object creation, generators, etc. None of these were done by the old JIT frontend. Some custom dunders uops were discovered to be broken as part of this work gh-140277

The optimizer stack space check is disabled, as it's no longer valid to deal with underflow.

Pros:
* Ignoring the generated tracer code as it's automatically created, this is only additional 1k lines of code. The maintenance burden is handled by the DSL and code generator.
* `optimizer.c` is now significantly simpler, as we don't have to do strange things to recover the bytecode from a trace.
* The new JIT frontend is able to handle a lot more control-flow than the old one.
* Tracing is very low overhead. We use the tail calling interpreter/computed goto interpreter to switch between tracing mode and non-tracing mode. I call this mechanism dual dispatch, as we have two dispatch tables dispatching to each other. Specialization is still enabled while tracing.
* Better handling of polymorphism. We leverage the specializing interpreter for this.

Cons:
* (For now) requires tail calling interpreter or computed gotos. This means no Windows JIT for now :(. Not to fret, tail calling is coming soon to Windows though https://github.com/python/cpython/pull/139962

Design:
* After each instruction, the `record_previous_inst` function/label is executed. This does as the name suggests.
* The tracing interpreter lowers bytecode to uops directly so that it can obtain "fresh" values at the point of lowering.
* The tracing version behaves nearly identical to the normal interpreter, in fact it even has specialization! This allows it to run without much of a slowdown when tracing. The actual cost of tracing is only a function call and writes to memory.
* The tracing interpreter uses the specializing interpreter's deopt to naturally form the side exit chains. This allows it to side exit chain effectively, without repeating much code. We force a re-specializing when tracing a deopt.
* The tracing interpreter can even handle goto errors/exceptions, but I chose to disable them for now as it's not tested.
* Because we do not share interpreter dispatch, there is should be no significant slowdown to the original specializing interpreter on tailcall and computed got with JIT disabled. With JIT enabled, there might be a slowdown in the form of the JIT trying to trace.
* Things that could have dynamic instruction pointer effects are guarded on. The guard deopts to a new instruction --- `_DYNAMIC_EXIT`.
2025-11-13 18:08:32 +00:00
..
mimalloc Doc: More duplicate word fixes (GH-136299) 2025-07-11 21:18:47 +03:00
pycore_abstract.h gh-135763: AC: Implement `allow_negative for Py_ssize_t` (#138150) 2025-09-01 22:55:22 +01:00
pycore_asdl.h
pycore_ast.h gh-132661: Implement PEP 750 (#132662) 2025-04-30 11:46:41 +02:00
pycore_ast_state.h gh-132661: Implement PEP 750 (#132662) 2025-04-30 11:46:41 +02:00
pycore_atexit.h gh-131238: Remove pycore_lock.h includes (#131483) 2025-03-19 23:46:25 +00:00
pycore_audit.h gh-125604: Move _Py_AuditHookEntry, etc. Out of pycore_runtime.h (gh-125605) 2024-10-18 09:26:08 -06:00
pycore_backoff.h gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310) 2025-11-13 18:08:32 +00:00
pycore_bitutils.h gh-108216: Cleanup #include in internal header files (#108228) 2023-08-21 18:05:59 +00:00
pycore_blocks_output_buffer.h gh-139877: Use PyBytesWriter in pycore_blocks_output_buffer.h (#139976) 2025-10-14 10:03:55 -07:00
pycore_brc.h gh-131238: Remove pycore_lock.h includes (#131483) 2025-03-19 23:46:25 +00:00
pycore_bytes_methods.h gh-71679: Share the repr implementation between bytes and bytearray (GH-138181) 2025-09-17 11:10:29 +03:00
pycore_bytesobject.h gh-139871: Add bytearray.take_bytes([n]) to efficiently extract bytes (GH-140128) 2025-11-13 13:19:44 +00:00
pycore_c_array.h gh-130080: move _Py_EnsureArrayLargeEnough to a separate header so it can be used outside of the compiler (#130930) 2025-03-13 16:02:58 +00:00
pycore_call.h gh-131776: Expose functions called from the interpreter loop via PyAPI_FUNC (#134242) 2025-09-17 08:04:02 -07:00
pycore_capsule.h gh-108240: Add pycore_capsule.h internal header file (#108596) 2023-08-29 01:20:02 +00:00
pycore_cell.h gh-123358: Use _PyStackRef in LOAD_DEREF (gh-130064) 2025-03-26 12:08:20 -04:00
pycore_ceval.h gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310) 2025-11-13 18:08:32 +00:00
pycore_ceval_state.h gh-131238: Remove pycore_lock.h includes (#131483) 2025-03-19 23:46:25 +00:00
pycore_code.h gh-140815: Fix faulthandler for invalid/freed frame (#140921) 2025-11-04 11:48:28 +01:00
pycore_codecs.h gh-131238: Remove pycore_lock.h includes (#131483) 2025-03-19 23:46:25 +00:00
pycore_compile.h gh-135801: Add the module parameter to compile() etc (GH-139652) 2025-11-13 13:21:32 +02:00
pycore_complexobject.h gh-128813: hide mixed-mode functions for complex arithmetic from C-API (#131703) 2025-04-22 14:18:18 +02:00
pycore_condvar.h gh-131082: Add missing guards for WIN32_LEAN_AND_MEAN (#131044) 2025-03-11 12:33:01 +01:00
pycore_context.h GH-131238: Core header refactor (GH-131250) 2025-03-17 09:19:04 +00:00
pycore_critical_section.h gh-133296: Publicly expose critical section API that accepts PyMutex (gh-135899) 2025-07-21 17:25:43 -04:00
pycore_crossinterp.h gh-132775: Clean Up Cross-Interpreter Error Handling (gh-135369) 2025-06-13 16:45:21 -06:00
pycore_crossinterp_data_registry.h gh-132775: Support Fallbacks in _PyObject_GetXIData() (gh-133482) 2025-05-21 07:23:48 -06:00
pycore_debug_offsets.h gh-135755: Make Py_MAX_SCRIPT_PATH_SIZE private (#138350) 2025-09-01 20:01:01 +01:00
pycore_descrobject.h
pycore_dict.h gh-125996: fix thread safety of collections.OrderedDict (#133734) 2025-10-13 22:55:07 +05:30
pycore_dict_state.h gh-124296: Remove private dictionary version tag (PEP 699) (#124472) 2024-10-01 12:39:56 -04:00
pycore_dtoa.h GH-131238: Core header refactor (GH-131250) 2025-03-17 09:19:04 +00:00
pycore_emscripten_signal.h GH-108614: Unbreak emscripten build (GH-109132) 2023-09-08 17:54:45 +01:00
pycore_emscripten_trampoline.h gh-132097: use a macro for semantically casting function pointers (#132406) 2025-04-18 12:24:34 +02:00
pycore_exceptions.h gh-129668: Fix thread-safety of MemoryError freelist in free threaded build (gh-129704) 2025-02-06 12:38:12 -05:00
pycore_faulthandler.h gh-127604: Add C stack dumps to faulthandler (#128159) 2025-04-21 20:48:02 +01:00
pycore_fileutils.h gh-131238: Use pycore_interp_structs.h header (#131481) 2025-03-19 23:13:25 +00:00
pycore_fileutils_windows.h gh-108220: Internal header files require Py_BUILD_CORE to be defined (#108221) 2023-08-21 19:15:52 +02:00
pycore_floatobject.h GH-131238: Core header refactor (GH-131250) 2025-03-17 09:19:04 +00:00
pycore_flowgraph.h gh-119744: move a few functions from compile.c to flowgraph.c (#119745) 2024-05-30 21:55:06 +01:00
pycore_format.h
pycore_frame.h gh-130704: Strength reduce LOAD_FAST{_LOAD_FAST} (#130708) 2025-04-01 10:18:42 -07:00
pycore_freelist.h gh-140544: store pointer to interpreter state as a thread local for fast access (#140573) 2025-10-25 19:56:07 +05:30
pycore_freelist_state.h gh-129813, PEP 782: Add PyBytesWriter C API (#138822) 2025-09-12 13:41:59 +02:00
pycore_function.h gh-131776: Expose functions called from the interpreter loop via PyAPI_FUNC (#134242) 2025-09-17 08:04:02 -07:00
pycore_gc.h GH-139951: Fix major GC performance regression (GH-140262) 2025-10-21 15:22:15 +01:00
pycore_genobject.h gh-131776: Expose functions called from the interpreter loop via PyAPI_FUNC (#134242) 2025-09-17 08:04:02 -07:00
pycore_getopt.h
pycore_gil.h gh-116322: Enable the GIL while loading C extension modules (#118560) 2024-05-06 23:07:23 -04:00
pycore_global_objects.h GH-131238: Core header refactor (GH-131250) 2025-03-17 09:19:04 +00:00
pycore_global_objects_fini_generated.h gh-139817: Attribute __qualname__ is added to TypeAliasType (#139919) 2025-10-15 09:08:17 -07:00
pycore_global_strings.h gh-139817: Attribute __qualname__ is added to TypeAliasType (#139919) 2025-10-15 09:08:17 -07:00
pycore_hamt.h GH-131238: Core header refactor (GH-131250) 2025-03-17 09:19:04 +00:00
pycore_hashtable.h gh-107211: No longer export internal functions (5) (#108423) 2023-08-24 16:06:53 +00:00
pycore_import.h gh-81313: Add the math.integer module (PEP-791) (GH-133909) 2025-10-31 16:13:43 +02:00
pycore_importdl.h gh-141169: Re-raise exception from findfuncptr (GH-141349) 2025-11-11 13:52:13 +01:00
pycore_index_pool.h GH-131238: Core header refactor (GH-131250) 2025-03-17 09:19:04 +00:00
pycore_initconfig.h gh-130931: Add pycore_typedefs.h internal header (#131396) 2025-03-19 15:23:32 +01:00
pycore_instruction_sequence.h gh-130907: Treat all module-level annotations as conditional (#131550) 2025-04-28 06:10:28 -07:00
pycore_instruments.h gh-131776: Expose functions called from the interpreter loop via PyAPI_FUNC (#134242) 2025-09-17 08:04:02 -07:00
pycore_interp.h gh-131238: Remove includes from pycore_interp.h (#131495) 2025-03-20 11:35:23 +00:00
pycore_interp_structs.h gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310) 2025-11-13 18:08:32 +00:00
pycore_interpframe.h gh-140815: Fix faulthandler for invalid/freed frame (#140921) 2025-11-04 11:48:28 +01:00
pycore_interpframe_structs.h gh-131238: Add pycore_interpframe_structs.h header (#131553) 2025-03-21 17:19:47 +00:00
pycore_interpolation.h gh-132661: Implement PEP 750 (#132662) 2025-04-30 11:46:41 +02:00
pycore_intrinsics.h gh-116126: Implement PEP 696 (#116129) 2024-05-03 06:17:32 -07:00
pycore_jit.h GH-131498: Remove conditional stack effects (GH-131499) 2025-03-20 15:39:38 +00:00
pycore_list.h gh-100239: specialize BINARY_OP/SUBSCR for list-slice (#132626) 2025-05-01 10:28:52 +00:00
pycore_llist.h gh-111964: Implement stop-the-world pauses (gh-112471) 2024-01-23 11:08:23 -07:00
pycore_lock.h gh-136759: rename lock.h to pylock.h (#137041) 2025-07-24 16:16:07 +05:30
pycore_long.h gh-129813, PEP 782: Use PyBytesWriter in _PyBytes_FormatEx() (#138839) 2025-09-15 12:23:36 +02:00
pycore_magic_number.h gh-138349: Fix crash when combining module-level annotation and listcomp (#138363) 2025-09-10 15:18:39 +02:00
pycore_memoryobject.h gh-132776: Revert Moving memoryview XIData Code to memoryobject.c (gh-132960) 2025-04-25 16:43:50 +00:00
pycore_mimalloc.h gh-122584: Import mimalloc headers in a C++ context (#122587) 2024-08-15 09:01:01 -04:00
pycore_modsupport.h gh-129594: Remove redundant check on varargs in _PyArg_CheckPositional (#129595) 2025-05-26 10:51:12 +02:00
pycore_moduleobject.h gh-141150: Don't rely on implicit conversion from void * to pointer in _PyModule… (#141147) 2025-11-06 07:16:56 -08:00
pycore_namespace.h
pycore_object.h gh-112075: Remove _PyObject_SetManagedDict() function (#139737) 2025-10-12 19:32:10 +02:00
pycore_object_alloc.h gh-112529: Use GC heaps for GC allocations in free-threaded builds (gh-114157) 2024-01-21 01:14:45 +09:00
pycore_object_deferred.h gh-117139: Convert the evaluation stack to stack refs (#118450) 2024-06-27 03:10:43 +08:00
pycore_object_stack.h gh-125859: Fix crash when gc.get_objects is called during GC (#125882) 2024-10-24 09:33:11 -04:00
pycore_object_state.h gh-125604: Move _Py_AuditHookEntry, etc. Out of pycore_runtime.h (gh-125605) 2024-10-18 09:26:08 -06:00
pycore_obmalloc.h gh-92953: Improve nextpool/prevpool comment. (gh-125545) 2024-10-15 11:47:20 -07:00
pycore_obmalloc_init.h gh-113055: Use pointer for interp->obmalloc state (gh-113412) 2024-01-26 19:38:14 -08:00
pycore_opcode_metadata.h gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310) 2025-11-13 18:08:32 +00:00
pycore_opcode_utils.h gh-132775: Unrevert "Add _PyCode_VerifyStateless()" (gh-133528) 2025-05-08 00:00:33 +00:00
pycore_optimizer.h gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310) 2025-11-13 18:08:32 +00:00
pycore_parking_lot.h gh-110850: Cleanup pycore_time.h includes (#115724) 2024-02-20 16:50:43 +00:00
pycore_parser.h gh-135801: Add the module parameter to compile() etc (GH-139652) 2025-11-13 13:21:32 +02:00
pycore_pathconfig.h gh-107211: No longer export internal functions (6) (#108424) 2023-08-24 17:28:35 +02:00
pycore_pyarena.h gh-107211: Fix test_peg_generator (#108435) 2023-08-24 17:47:44 +00:00
pycore_pyatomic_ft_wrappers.h gh-137514: Add a free-threading wrapper for mutexes (GH-137515) 2025-08-07 11:24:50 -04:00
pycore_pybuffer.h gh-76785: Rename _xxsubinterpreters to _interpreters (gh-117791) 2024-04-24 16:18:24 +00:00
pycore_pyerrors.h gh-135801: Add the module parameter to compile() etc (GH-139652) 2025-11-13 13:21:32 +02:00
pycore_pyhash.h GH-131238: Core header refactor (GH-131250) 2025-03-17 09:19:04 +00:00
pycore_pylifecycle.h gh-136421: Load _datetime static types during interpreter initialization (GH-136583) 2025-07-21 13:47:26 -04:00
pycore_pymath.h gh-141004: soft-deprecate Py_INFINITY macro (#141033) 2025-11-12 13:44:49 +01:00
pycore_pymem.h gh-140815: Fix faulthandler for invalid/freed frame (#140921) 2025-11-04 11:48:28 +01:00
pycore_pymem_init.h gh-115103: Implement delayed free mechanism for free-threaded builds (#115367) 2024-02-20 13:04:37 -05:00
pycore_pystate.h gh-140544: fix build for including pycore_pystate.h when HAVE_THREAD_LOCAL is not defined (#140623) 2025-10-28 01:40:41 +05:30
pycore_pystats.h gh-131253: free-threaded build support for pystats (gh-137189) 2025-11-03 11:36:37 -08:00
pycore_pythonrun.h gh-139653: Add PyUnstable_ThreadState_SetStackProtection() (#139668) 2025-11-13 17:30:50 +01:00
pycore_pythread.h GH-131238: Core header refactor (GH-131250) 2025-03-17 09:19:04 +00:00
pycore_qsbr.h GH-133136: Revise QSBR to reduce excess memory held (gh-135473) 2025-06-25 00:06:32 -07:00
pycore_range.h
pycore_runtime.h gh-131238: Use pycore_interp_structs.h header (#131481) 2025-03-19 23:13:25 +00:00
pycore_runtime_init.h gh-131185: Use a proper thread-local for cached thread states (gh-132510) 2025-05-21 07:01:25 -06:00
pycore_runtime_init_generated.h gh-139817: Attribute __qualname__ is added to TypeAliasType (#139919) 2025-10-15 09:08:17 -07:00
pycore_runtime_structs.h gh-133059: Increase the small positive integer cache to 1024 (GH-133160) 2025-09-24 17:05:30 -04:00
pycore_semaphore.h gh-137433: Fix deadlock with stop-the-world and daemon threads (gh-137735) 2025-09-16 09:21:58 +01:00
pycore_setobject.h gh-130312: SET_ADD should not lock (#130136) 2025-03-21 15:58:32 -07:00
pycore_signal.h gh-109693: Use pyatomic.h for signal module (gh-110480) 2023-10-10 08:26:29 +09:00
pycore_sliceobject.h GH-115802: JIT "small" code for Windows (GH-115964) 2024-02-29 08:11:28 -08:00
pycore_stackref.h gh-131527: Stackref debug borrow checker (#140599) 2025-11-05 11:12:56 -08:00
pycore_stats.h gh-131253: free-threaded build support for pystats (gh-137189) 2025-11-03 11:36:37 -08:00
pycore_strhex.h gh-107211: No longer export pycore_strhex.h functions (#108229) 2023-08-21 18:12:22 +00:00
pycore_structs.h GH-131498: Cases generator: manage stacks automatically (GH-132074) 2025-04-04 17:59:36 +01:00
pycore_structseq.h
pycore_symtable.h gh-135801: Add the module parameter to compile() etc (GH-139652) 2025-11-13 13:21:32 +02:00
pycore_sysmodule.h gh-108512: Add and use new replacements for PySys_GetObject() (GH-111035) 2025-05-28 20:11:09 +03:00
pycore_template.h gh-132661: Implement PEP 750 (#132662) 2025-04-30 11:46:41 +02:00
pycore_time.h Remove internal _PyTime_AsLong() function (#141053) 2025-11-05 18:37:06 +01:00
pycore_token.h gh-132661: Implement PEP 750 (#132662) 2025-04-30 11:46:41 +02:00
pycore_traceback.h gh-125434: Display thread name in faulthandler on Windows (#140675) 2025-10-27 18:41:18 +01:00
pycore_tracemalloc.h gh-129185: Use PyMutex in tracemalloc (#129246) 2025-01-24 11:25:24 +01:00
pycore_tstate.h gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310) 2025-11-13 18:08:32 +00:00
pycore_tuple.h gh-111489: Remove _PyTuple_FromArray() alias (#139973) 2025-10-11 22:58:14 +02:00
pycore_typedefs.h gh-130931: Add pycore_typedefs.h internal header (#131396) 2025-03-19 15:23:32 +01:00
pycore_typeobject.h gh-132042: Remove resolve_slotdups() to speedup class creation (#132156) 2025-10-03 11:58:00 +02:00
pycore_typevarobject.h gh-119180: Add evaluate functions for type params and type aliases (#122212) 2024-07-27 17:24:10 +00:00
pycore_ucnhash.h gh-111972: Make Unicode name C APIcapsule initialization thread-safe (#112249) 2023-11-30 11:12:49 +01:00
pycore_unicodectype.h gh-129117: Add unicodedata.isxidstart() function (#140269) 2025-10-30 10:18:12 +00:00
pycore_unicodeobject.h gh-139353: Add Objects/unicode_writer.c file (#139911) 2025-10-30 14:36:15 +01:00
pycore_unicodeobject_generated.h gh-139817: Attribute __qualname__ is added to TypeAliasType (#139919) 2025-10-15 09:08:17 -07:00
pycore_unionobject.h gh-105499: Merge typing.Union and types.UnionType (#105511) 2025-03-04 11:44:19 -08:00
pycore_uniqueid.h GH-131238: Core header refactor (GH-131250) 2025-03-17 09:19:04 +00:00
pycore_uop.h gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310) 2025-11-13 18:08:32 +00:00
pycore_uop_ids.h gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310) 2025-11-13 18:08:32 +00:00
pycore_uop_metadata.h gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310) 2025-11-13 18:08:32 +00:00
pycore_warnings.h GH-131238: Core header refactor (GH-131250) 2025-03-17 09:19:04 +00:00
pycore_weakref.h gh-135607: remove null checking of weakref list in dealloc of extension modules and objects (#135614) 2025-06-30 11:14:31 +00:00