Commit graph

65 commits

Author SHA1 Message Date
T. Wouters
de2f7da77d
gh-115999: Add free-threaded specialization for FOR_ITER (#128798)
Add free-threaded versions of existing specialization for FOR_ITER (list, tuples, fast range iterators and generators), without significantly affecting their thread-safety. (Iterating over shared lists/tuples/ranges should be fine like before. Reusing iterators between threads is not fine, like before. Sharing generators between threads is a recipe for significant crashes, like before.)
2025-03-12 16:21:46 +01:00
Mark Shannon
54965f3fb2
GH-130296: Avoid stack transients in four instructions. (GH-130310)
* Combine _GUARD_GLOBALS_VERSION_PUSH_KEYS and _LOAD_GLOBAL_MODULE_FROM_KEYS into _LOAD_GLOBAL_MODULE

* Combine _GUARD_BUILTINS_VERSION_PUSH_KEYS and _LOAD_GLOBAL_BUILTINS_FROM_KEYS into _LOAD_GLOBAL_BUILTINS

* Combine _CHECK_ATTR_MODULE_PUSH_KEYS and _LOAD_ATTR_MODULE_FROM_KEYS into _LOAD_ATTR_MODULE

* Remove stack transient in LOAD_ATTR_WITH_HINT
2025-02-28 18:00:38 +00:00
Irit Katriel
a1417b211f
gh-100239: replace BINARY_SUBSCR & family by BINARY_OP with oparg NB_SUBSCR (#129700) 2025-02-07 22:39:54 +00:00
Brandt Bucher
5fa7e1b7fd
GH-129715: Remove _DYNAMIC_EXIT (GH-129716) 2025-02-07 11:41:17 -08:00
Mark Shannon
75b628adeb
GH-128563: Generate opcode = ... in instructions that need opcode (GH-129608)
* Remove support for GO_TO_INSTRUCTION
2025-02-03 15:09:21 +00:00
Mark Shannon
75b4962157
GH-128914: Remove all but one conditional stack effects (GH-129226)
* Remove all 'if (0)' and 'if (1)' conditional stack effects

* Use array instead of conditional for BUILD_SLICE args

* Refactor LOAD_GLOBAL to use a common conditional uop

* Remove conditional stack effects from LOAD_ATTR specializations

* Replace conditional stack effects in LOAD_ATTR with a 0 or 1 sized array.

* Remove conditional stack effects from CALL_FUNCTION_EX
2025-01-27 16:24:48 +00:00
Sam Gross
a10f99375e
Revert "GH-128914: Remove conditional stack effects from bytecodes.c and the code generators (GH-128918)" (GH-129202)
The commit introduced a ~2.5-3% regression in the free threading build.

This reverts commit ab61d3f430.
2025-01-23 09:26:25 +00:00
Mark Shannon
ab61d3f430
GH-128914: Remove conditional stack effects from bytecodes.c and the code generators (GH-128918) 2025-01-20 17:09:23 +00:00
Xuanteng Huang
b44ff6d0df
GH-126599: Remove the "counter" optimizer/executor (GH-126853) 2025-01-16 15:57:04 -08:00
Irit Katriel
3893a92d95
gh-100239: specialize long tail of binary operations (#128722) 2025-01-16 15:22:13 +00:00
Mark Shannon
ddd959987c
GH-128685: Specialize (rather than quicken) LOAD_CONST into LOAD_CONST_[IM]MORTAL (GH-128708) 2025-01-13 10:30:28 +00:00
Mark Shannon
f826beca0c
GH-128375: Better instrument for FOR_ITER (GH-128445) 2025-01-06 17:54:47 +00:00
Neil Schemenauer
1b15c89a17
gh-115999: Specialize STORE_ATTR in free-threaded builds. (gh-127838)
* Add `_PyDictKeys_StringLookupSplit` which does locking on dict keys and
  use in place of `_PyDictKeys_StringLookup`.

* Change `_PyObject_TryGetInstanceAttribute` to use that function
  in the case of split keys.

* Add `unicodekeys_lookup_split` helper which allows code sharing
  between `_Py_dict_lookup` and `_PyDictKeys_StringLookupSplit`.

* Fix locking for `STORE_ATTR_INSTANCE_VALUE`.  Create
  `_GUARD_TYPE_VERSION_AND_LOCK` uop so that object stays locked and
  `tp_version_tag` cannot change.

* Pass `tp_version_tag` to `specialize_dict_access()`, ensuring
  the version we store on the cache is the correct one (in case of
  it changing during the specalize analysis).

* Split `analyze_descriptor` into `analyze_descriptor_load` and
  `analyze_descriptor_store` since those don't share much logic.
  Add `descriptor_is_class` helper function.

* In `specialize_dict_access`, double check `_PyObject_GetManagedDict()`
  in case we race and dict was materialized before the lock.

* Avoid borrowed references in `_Py_Specialize_StoreAttr()`.

* Use `specialize()` and `unspecialize()` helpers.

* Add unit tests to ensure specializing happens as expected in FT builds.

* Add unit tests to attempt to trigger data races (useful for running under TSAN).

* Add `has_split_table` function to `_testinternalcapi`.
2024-12-19 10:21:17 -08:00
Mark Shannon
d2f1d917e8
GH-122548: Implement branch taken and not taken events for sys.monitoring (GH-122564) 2024-12-19 16:59:51 +00:00
mpage
2de048ce79
gh-115999: Specialize loading attributes from modules in free-threaded builds (#127711)
We use the same approach that was used for specialization of LOAD_GLOBAL in free-threaded builds:

_CHECK_ATTR_MODULE is renamed to _CHECK_ATTR_MODULE_PUSH_KEYS; it pushes the keys object for the following _LOAD_ATTR_MODULE_FROM_KEYS (nee _LOAD_ATTR_MODULE). This arrangement avoids having to recheck the keys version.

_LOAD_ATTR_MODULE is renamed to _LOAD_ATTR_MODULE_FROM_KEYS; it loads the value from the keys object pushed by the preceding _CHECK_ATTR_MODULE_PUSH_KEYS at the cached index.
2024-12-13 10:17:16 -08:00
Ken Jin
6293d00e72
gh-120619: Strength reduce function guards, support 2-operand uop forms (GH-124846)
Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>
2024-11-09 11:35:33 +08:00
mpage
2e95c5ba3b
gh-115999: Implement thread-local bytecode and enable specialization for BINARY_OP (#123926)
Each thread specializes a thread-local copy of the bytecode, created on the first RESUME, in free-threaded builds. All copies of the bytecode for a code object are stored in the co_tlbc array on the code object. Threads reserve a globally unique index identifying its copy of the bytecode in all co_tlbc arrays at thread creation and release the index at thread destruction. The first entry in every co_tlbc array always points to the "main" copy of the bytecode that is stored at the end of the code object. This ensures that no bytecode is copied for programs that do not use threads.

Thread-local bytecode can be disabled at runtime by providing either -X tlbc=0 or PYTHON_TLBC=0. Disabling thread-local bytecode also disables specialization.

Concurrent modifications to the bytecode made by the specializing interpreter and instrumentation use atomics, with specialization taking care not to overwrite an instruction that was instrumented concurrently.
2024-11-04 11:13:32 -08:00
Mark Shannon
faa3272fb8
GH-125837: Split LOAD_CONST into three. (GH-125972)
* Add LOAD_CONST_IMMORTAL opcode

* Add LOAD_SMALL_INT opcode

* Remove RETURN_CONST opcode
2024-10-29 11:15:42 +00:00
mpage
f978fb4f8d
gh-115999: Refactor LOAD_GLOBAL specializations to avoid reloading {globals, builtins} keys (gh-124953)
Each of the `LOAD_GLOBAL` specializations is implemented roughly as:

1. Load keys version.
2. Load cached keys version.
3. Deopt if (1) and (2) don't match.
4. Load keys.
5. Load cached index into keys.
6. Load object from (4) at offset from (5).

This is not thread-safe in free-threaded builds; the keys object may be replaced
in between steps (3) and (4).

This change refactors the specializations to avoid reloading the keys object and
instead pass the keys object from guards to be consumed by downstream uops.
2024-10-09 15:18:25 +00:00
Mark Shannon
da071fa3e8
GH-119866: Spill the stack around escaping calls. (GH-124392)
* Spill the evaluation around escaping calls in the generated interpreter and JIT. 

* The code generator tracks live, cached values so they can be saved to memory when needed.

* Spills the stack pointer around escaping calls, so that the exact stack is visible to the cycle GC.
2024-10-07 14:56:39 +01:00
Savannah Ostrowski
65f1237098
GH-123516: Improve JIT memory consumption by invalidating cold executors (GH-124443)
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
2024-09-27 00:35:42 +00:00
Mark Shannon
54a05a4600
GH-123232: Factor BINARY_SLICE and STORE_SLICE to handle stats properly for tier 2. (GH-123381) 2024-08-27 10:49:39 +01:00
Mark Shannon
bb1d30336e
GH-118093: Make CALL_ALLOC_AND_ENTER_INIT suitable for tier 2. (GH-123140)
* Convert CALL_ALLOC_AND_ENTER_INIT to micro-ops such that tier 2 supports it

* Allow inexact arguments for CALL_ALLOC_AND_ENTER_INIT.
2024-08-20 16:52:58 +01:00
Mark Shannon
c13e7d98fb
GH-118093: Specialize CALL_KW (GH-123006) 2024-08-16 17:11:24 +01:00
Mark Shannon
eec7bdaf01
GH-120024: Remove CHECK_EVAL_BREAKER macro. (GH-122968)
* Factor some instructions into micro-ops to isolate CHECK_EVAL_BREAKER for escape analysis

* Eliminate CHECK_EVAL_BREAKER macro
2024-08-14 12:04:05 +01:00
Mark Shannon
df13a1821a
GH-118095: Add tier two support for BINARY_SUBSCR_GETITEM (GH-120793) 2024-08-01 16:19:05 -07:00
Mark Shannon
95a73917cd
GH-122029: Break INSTRUMENTED_CALL into micro-ops, so that its behavior is consistent with CALL (GH-122177) 2024-07-26 14:35:57 +01:00
Mark Shannon
afb0aa6ed2
GH-121131: Clean up and fix some instrumented instructions. (GH-121132)
* Add support for 'prev_instr' to code generator and refactor some INSTRUMENTED instructions
2024-07-26 12:24:12 +01:00
Brandt Bucher
d9efa45d74
GH-118093: Add tier two support for BINARY_OP_INPLACE_ADD_UNICODE (GH-122253) 2024-07-25 14:45:07 -07:00
Brandt Bucher
5f6001130f
GH-118093: Add tier two support for LOAD_ATTR_PROPERTY (GH-122283) 2024-07-25 10:45:28 -07:00
Mark Shannon
2e14a52cce
GH-122160: Remove BUILD_CONST_KEY_MAP opcode. (GH-122164) 2024-07-25 16:24:29 +01:00
Brandt Bucher
7b36b67b1e
GH-118093: Add tier two support to several instructions (GH-121884) 2024-07-18 14:24:58 -07:00
Brandt Bucher
33903c53db
GH-116017: Get rid of _COLD_EXITs (GH-120960) 2024-07-01 13:17:40 -07:00
Mark Shannon
9cefcc0ee7
GH-120507: Lower the BEFORE_WITH and BEFORE_ASYNC_WITH instructions. (#120640)
* Remove BEFORE_WITH and BEFORE_ASYNC_WITH instructions.

* Add LOAD_SPECIAL instruction

* Reimplement `with` and `async with` statements using LOAD_SPECIAL
2024-06-18 12:17:46 +01:00
Mark Shannon
274f844830
GH-120619: Clean up RETURN_VALUE instruction (GH-120624)
* Rename _POP_FRAME to _RETURN_VALUE as it returns a value as well as popping a frame.

* Remove remaining _POP_FRAMEs
2024-06-17 14:40:11 +01:00
Brandt Bucher
5cd3ffd6b7
GH-119258: Handle STORE_ATTR_WITH_HINT in tier two (GH-119481) 2024-05-28 12:47:54 -07:00
Jelle Zijlstra
98e855fcc1
gh-119180: Add LOAD_COMMON_CONSTANT opcode (#119321)
The PEP 649 implementation will require a way to load NotImplementedError
from the bytecode. @markshannon suggested implementing this by converting
LOAD_ASSERTION_ERROR into a more general mechanism for loading constants.

This PR adds this new opcode. I will work on the rest of the implementation
of the PEP separately.

Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
2024-05-22 00:46:39 +00:00
Mark Shannon
1ab6356ebe
GH-118095: Use broader specializations of CALL in tier 1, for better tier 2 support of calls. (GH-118322)
* Add CALL_PY_GENERAL, CALL_BOUND_METHOD_GENERAL and call CALL_NON_PY_GENERAL specializations.

* Remove CALL_PY_WITH_DEFAULTS specialization

* Use CALL_NON_PY_GENERAL in more cases when otherwise failing to specialize
2024-05-04 12:11:11 +01:00
Mark Shannon
da2cfc4cb6
GH-113464: Remove the extra jump via _SIDE_EXIT in _EXIT_TRACE (GH-118545) 2024-05-04 08:50:24 +01:00
Mark Shannon
67bba9dd0f
GH-117442: Check eval-breaker at start (rather than end) of tier 2 loops (GH-118482) 2024-05-02 13:10:31 +01:00
Mark Shannon
5b05d452cd
GH-118095: Add tier 2 support for YIELD_VALUE (GH-118380) 2024-04-30 11:33:13 +01:00
Mark Shannon
ab6eda0ee5
GH-118095: Allow a variant of RESUME_CHECK in tier 2 (GH-118286) 2024-04-29 07:54:05 +01:00
Mark Shannon
3e06c7f719
GH-118095: Add dynamic exit support and FOR_ITER_GEN support to tier 2 (GH-118279) 2024-04-26 18:08:50 +01:00
Mark Shannon
f180b31e76
GH-118095: Handle RETURN_GENERATOR in tier 2 (GH-118180) 2024-04-25 11:32:47 +01:00
Mark Shannon
a6647d16ab
GH-115480: Reduce guard strength for binary ops when type of one operand is known already (GH-118050) 2024-04-22 13:34:06 +01:00
Peter Lazorchak
1c43468886
gh-116168: Remove extra _CHECK_STACK_SPACE uops (#117242)
This merges all `_CHECK_STACK_SPACE` uops in a trace into a single `_CHECK_STACK_SPACE_OPERAND` uop that checks whether there is enough stack space for all calls included in the entire trace.
2024-04-03 17:14:18 +00:00
Mark Shannon
c32dc47aca
GH-115776: Embed the values array into the object, for "normal" Python objects. (GH-116115) 2024-04-02 11:59:21 +01:00
Mark Shannon
bf82f77957
GH-116422: Tier2 hot/cold splitting (GH-116813)
Splits the "cold" path, deopts and exits, from the "hot" path, reducing the size of most jitted instructions, at the cost of slower exits.
2024-03-26 09:35:11 +00:00
Mark Shannon
61e54bfcee
GH-116422: Factor out eval breaker checks at end of calls into its own micro-op. (GH-116817) 2024-03-14 16:31:47 +00:00
Ken Jin
41457c7fdb
gh-116381: Remove bad specializations, add fail stats (GH-116464)
* Remove bad specializations, add fail stats
2024-03-08 00:21:21 +08:00