* Add _PyObject_HasFastCall()
* partial_call() now avoids temporary tuple to pass positional
arguments if the callable supports the FASTCALL calling convention
for positional arguments.
* Fix also a performance regression in partial_call() if the callable
doesn't support FASTCALL.
PyEval_Call* APIs are not documented and they doesn't respect PY_SSIZE_T_CLEAN.
So add comment block which recommends PyObject_Call* APIs to ceval.h.
This commit also changes PyEval_CallMethod and PyEval_CallFunction
implementation same to PyObject_CallMethod and PyObject_CallFunction
to reduce future maintenance cost. Optimization to avoid temporary
tuple are copied too.
PyEval_CallFunction(callable, "i", (int)i) now calls callable(i) instead of
raising TypeError. But accepting this edge case is backward compatible.
allocated.
On PyMem_Realloc failure, _PyCode_SetExtra should free co_extra if
co_extra->ce_extras could not be allocated.
On PyMem_Realloc success, _PyCode_SetExtra should set all unused slots in
co_extra->ce_extras to NULL.
* Move all functions to call objects in a new Objects/call.c file.
* Rename fast_function() to _PyFunction_FastCallKeywords().
* Copy null_error() from Objects/abstract.c
* Inline type_error() in call.c to not have to copy it, it was only
called once.
* Export _PyEval_EvalCodeWithName() since it is now called
from call.c.
* Move all functions to call objects in a new Objects/call.c file.
* Rename fast_function() to _PyFunction_FastCallKeywords().
* Copy null_error() from Objects/abstract.c
* Inline type_error() in call.c to not have to copy it, it was only
called once.
* Export _PyEval_EvalCodeWithName() since it is now called
from call.c.
Issue #29507: Optimize slots calling Python methods. For Python methods, get
the unbound Python function and prepend arguments with self, rather than
calling the descriptor which creates a temporary PyMethodObject.
Add a new _PyObject_FastCall_Prepend() function used to call the unbound Python
method with self. It avoids the creation of a temporary tuple to pass
positional arguments.
Avoiding temporary PyMethodObject and avoiding temporary tuple makes Python
slots up to 1.46x faster. Microbenchmark on a __getitem__() method implemented
in Python:
Median +- std dev: 121 ns +- 5 ns -> 82.8 ns +- 1.0 ns: 1.46x faster (-31%)
Co-Authored-by: INADA Naoki <songofacandy@gmail.com>
Issue #29259, #29465: PyCFunction_Call() doesn't create anymore a redundant
tuple to pass positional arguments for METH_VARARGS.
Add a new cfunction_call() subfunction.
* *PyCFunction_*Call*() functions now call Py_EnterRecursiveCall().
* PyObject_Call() now calls directly _PyFunction_FastCallDict() and
PyCFunction_Call() to avoid calling Py_EnterRecursiveCall() twice per
function call
Decreased density gives better collision statistics (average of 2.5 probes in a
full table versus 3.0 previously) and fewer occurences of starting a second
possibly overlapping sequence of 10 linear probes. Makes resizes a little more
frequent but each with less work (fewer insertions and fewer collisions).
add_methods(), add_members(), and add_getset() used PyDict_SetItemString()
to register descriptor to the type's dict.
So descr_new() and PyDict_SetItemString() creates interned unicode from same
C string.
This patch takes interned unicode from descriptor, and use PyDict_SetItem()
instead of PyDict_SetItemString().
python_startup_no_site:
default: Median +- std dev: 12.7 ms +- 0.1 ms
patched: Median +- std dev: 12.5 ms +- 0.1 ms