gh-95913: Edit Faster CPython section in 3.11 WhatsNew (GH-98429)

Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
This commit is contained in:
C.A.M. Gerlach 2023-03-06 20:45:52 -06:00 committed by GitHub
parent 8606697f49
commit 80b19a30c0
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -1317,14 +1317,17 @@ This section covers specific optimizations independent of the
Faster CPython Faster CPython
============== ==============
CPython 3.11 is on average `25% faster <https://github.com/faster-cpython/ideas#published-results>`_ CPython 3.11 is an average of
than CPython 3.10 when measured with the `25% faster <https://github.com/faster-cpython/ideas#published-results>`_
than CPython 3.10 as measured with the
`pyperformance <https://github.com/python/pyperformance>`_ benchmark suite, `pyperformance <https://github.com/python/pyperformance>`_ benchmark suite,
and compiled with GCC on Ubuntu Linux. Depending on your workload, the speedup when compiled with GCC on Ubuntu Linux.
could be up to 10-60% faster. Depending on your workload, the overall speedup could be 10-60%.
This project focuses on two major areas in Python: faster startup and faster This project focuses on two major areas in Python:
runtime. Other optimizations not under this project are listed in `Optimizations`_. :ref:`whatsnew311-faster-startup` and :ref:`whatsnew311-faster-runtime`.
Optimizations not covered by this project are listed separately under
:ref:`whatsnew311-optimizations`.
.. _whatsnew311-faster-startup: .. _whatsnew311-faster-startup:
@ -1337,8 +1340,8 @@ Faster Startup
Frozen imports / Static code objects Frozen imports / Static code objects
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Python caches bytecode in the :ref:`__pycache__<tut-pycache>` directory to Python caches :term:`bytecode` in the :ref:`__pycache__ <tut-pycache>`
speed up module loading. directory to speed up module loading.
Previously in 3.10, Python module execution looked like this: Previously in 3.10, Python module execution looked like this:
@ -1347,8 +1350,9 @@ Previously in 3.10, Python module execution looked like this:
Read __pycache__ -> Unmarshal -> Heap allocated code object -> Evaluate Read __pycache__ -> Unmarshal -> Heap allocated code object -> Evaluate
In Python 3.11, the core modules essential for Python startup are "frozen". In Python 3.11, the core modules essential for Python startup are "frozen".
This means that their code objects (and bytecode) are statically allocated This means that their :ref:`codeobjects` (and bytecode)
by the interpreter. This reduces the steps in module execution process to this: are statically allocated by the interpreter.
This reduces the steps in module execution process to:
.. code-block:: text .. code-block:: text
@ -1357,7 +1361,7 @@ by the interpreter. This reduces the steps in module execution process to this:
Interpreter startup is now 10-15% faster in Python 3.11. This has a big Interpreter startup is now 10-15% faster in Python 3.11. This has a big
impact for short-running programs using Python. impact for short-running programs using Python.
(Contributed by Eric Snow, Guido van Rossum and Kumar Aditya in numerous issues.) (Contributed by Eric Snow, Guido van Rossum and Kumar Aditya in many issues.)
.. _whatsnew311-faster-runtime: .. _whatsnew311-faster-runtime:
@ -1370,17 +1374,19 @@ Faster Runtime
Cheaper, lazy Python frames Cheaper, lazy Python frames
^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^
Python frames are created whenever Python calls a Python function. This frame Python frames, holding execution information,
holds execution information. The following are new frame optimizations: are created whenever Python calls a Python function.
The following are new frame optimizations:
- Streamlined the frame creation process. - Streamlined the frame creation process.
- Avoided memory allocation by generously re-using frame space on the C stack. - Avoided memory allocation by generously re-using frame space on the C stack.
- Streamlined the internal frame struct to contain only essential information. - Streamlined the internal frame struct to contain only essential information.
Frames previously held extra debugging and memory management information. Frames previously held extra debugging and memory management information.
Old-style frame objects are now created only when requested by debuggers or Old-style :ref:`frame objects <frame-objects>`
by Python introspection functions such as ``sys._getframe`` or are now created only when requested by debuggers
``inspect.currentframe``. For most user code, no frame objects are or by Python introspection functions such as :func:`sys._getframe` and
:func:`inspect.currentframe`. For most user code, no frame objects are
created at all. As a result, nearly all Python functions calls have sped created at all. As a result, nearly all Python functions calls have sped
up significantly. We measured a 3-7% speedup in pyperformance. up significantly. We measured a 3-7% speedup in pyperformance.
@ -1401,10 +1407,11 @@ In 3.11, when CPython detects Python code calling another Python function,
it sets up a new frame, and "jumps" to the new code inside the new frame. This it sets up a new frame, and "jumps" to the new code inside the new frame. This
avoids calling the C interpreting function altogether. avoids calling the C interpreting function altogether.
Most Python function calls now consume no C stack space. This speeds up Most Python function calls now consume no C stack space, speeding them up.
most of such calls. In simple recursive functions like fibonacci or In simple recursive functions like fibonacci or
factorial, a 1.7x speedup was observed. This also means recursive functions factorial, we observed a 1.7x speedup. This also means recursive functions
can recurse significantly deeper (if the user increases the recursion limit). can recurse significantly deeper
(if the user increases the recursion limit with :func:`sys.setrecursionlimit`).
We measured a 1-3% improvement in pyperformance. We measured a 1-3% improvement in pyperformance.
(Contributed by Pablo Galindo and Mark Shannon in :issue:`45256`.) (Contributed by Pablo Galindo and Mark Shannon in :issue:`45256`.)
@ -1415,7 +1422,7 @@ We measured a 1-3% improvement in pyperformance.
PEP 659: Specializing Adaptive Interpreter PEP 659: Specializing Adaptive Interpreter
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:pep:`659` is one of the key parts of the faster CPython project. The general :pep:`659` is one of the key parts of the Faster CPython project. The general
idea is that while Python is a dynamic language, most code has regions where idea is that while Python is a dynamic language, most code has regions where
objects and types rarely change. This concept is known as *type stability*. objects and types rarely change. This concept is known as *type stability*.
@ -1424,17 +1431,18 @@ in the executing code. Python will then replace the current operation with a
more specialized one. This specialized operation uses fast paths available only more specialized one. This specialized operation uses fast paths available only
to those use cases/types, which generally outperform their generic to those use cases/types, which generally outperform their generic
counterparts. This also brings in another concept called *inline caching*, where counterparts. This also brings in another concept called *inline caching*, where
Python caches the results of expensive operations directly in the bytecode. Python caches the results of expensive operations directly in the
:term:`bytecode`.
The specializer will also combine certain common instruction pairs into one The specializer will also combine certain common instruction pairs into one
superinstruction. This reduces the overhead during execution. superinstruction, reducing the overhead during execution.
Python will only specialize Python will only specialize
when it sees code that is "hot" (executed multiple times). This prevents Python when it sees code that is "hot" (executed multiple times). This prevents Python
from wasting time for run-once code. Python can also de-specialize when code is from wasting time on run-once code. Python can also de-specialize when code is
too dynamic or when the use changes. Specialization is attempted periodically, too dynamic or when the use changes. Specialization is attempted periodically,
and specialization attempts are not too expensive. This allows specialization and specialization attempts are not too expensive,
to adapt to new circumstances. allowing specialization to adapt to new circumstances.
(PEP written by Mark Shannon, with ideas inspired by Stefan Brunthaler. (PEP written by Mark Shannon, with ideas inspired by Stefan Brunthaler.
See :pep:`659` for more information. Implementation by Mark Shannon and Brandt See :pep:`659` for more information. Implementation by Mark Shannon and Brandt
@ -1447,32 +1455,32 @@ Bucher, with additional help from Irit Katriel and Dennis Sweeney.)
| Operation | Form | Specialization | Operation speedup | Contributor(s) | | Operation | Form | Specialization | Operation speedup | Contributor(s) |
| | | | (up to) | | | | | | (up to) | |
+===============+====================+=======================================================+===================+===================+ +===============+====================+=======================================================+===================+===================+
| Binary | ``x+x; x*x; x-x;`` | Binary add, multiply and subtract for common types | 10% | Mark Shannon, | | Binary | ``x + x`` | Binary add, multiply and subtract for common types | 10% | Mark Shannon, |
| operations | | such as ``int``, ``float``, and ``str`` take custom | | Dong-hee Na, | | operations | | such as :class:`int`, :class:`float` and :class:`str` | | Dong-hee Na, |
| | | fast paths for their underlying types. | | Brandt Bucher, | | | ``x - x`` | take custom fast paths for their underlying types. | | Brandt Bucher, |
| | | | | Dennis Sweeney | | | | | | Dennis Sweeney |
| | ``x * x`` | | | |
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
| Subscript | ``a[i]`` | Subscripting container types such as ``list``, | 10-25% | Irit Katriel, | | Subscript | ``a[i]`` | Subscripting container types such as :class:`list`, | 10-25% | Irit Katriel, |
| | | ``tuple`` and ``dict`` directly index the underlying | | Mark Shannon | | | | :class:`tuple` and :class:`dict` directly index | | Mark Shannon |
| | | data structures. | | | | | | the underlying data structures. | | |
| | | | | | | | | | | |
| | | Subscripting custom ``__getitem__`` | | | | | | Subscripting custom :meth:`~object.__getitem__` | | |
| | | is also inlined similar to :ref:`inline-calls`. | | | | | | is also inlined similar to :ref:`inline-calls`. | | |
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
| Store | ``a[i] = z`` | Similar to subscripting specialization above. | 10-25% | Dennis Sweeney | | Store | ``a[i] = z`` | Similar to subscripting specialization above. | 10-25% | Dennis Sweeney |
| subscript | | | | | | subscript | | | | |
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
| Calls | ``f(arg)`` | Calls to common builtin (C) functions and types such | 20% | Mark Shannon, | | Calls | ``f(arg)`` | Calls to common builtin (C) functions and types such | 20% | Mark Shannon, |
| | ``C(arg)`` | as ``len`` and ``str`` directly call their underlying | | Ken Jin | | | | as :func:`len` and :class:`str` directly call their | | Ken Jin |
| | | C version. This avoids going through the internal | | | | | ``C(arg)`` | underlying C version. This avoids going through the | | |
| | | calling convention. | | | | | | internal calling convention. | | |
| | | | | |
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
| Load | ``print`` | The object's index in the globals/builtins namespace | [1]_ | Mark Shannon | | Load | ``print`` | The object's index in the globals/builtins namespace | [#load-global]_ | Mark Shannon |
| global | ``len`` | is cached. Loading globals and builtins require | | | | global | | is cached. Loading globals and builtins require | | |
| variable | | zero namespace lookups. | | | | variable | ``len`` | zero namespace lookups. | | |
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
| Load | ``o.attr`` | Similar to loading global variables. The attribute's | [2]_ | Mark Shannon | | Load | ``o.attr`` | Similar to loading global variables. The attribute's | [#load-attr]_ | Mark Shannon |
| attribute | | index inside the class/object's namespace is cached. | | | | attribute | | index inside the class/object's namespace is cached. | | |
| | | In most cases, attribute loading will require zero | | | | | | In most cases, attribute loading will require zero | | |
| | | namespace lookups. | | | | | | namespace lookups. | | |
@ -1484,14 +1492,15 @@ Bucher, with additional help from Irit Katriel and Dennis Sweeney.)
| Store | ``o.attr = z`` | Similar to load attribute optimization. | 2% | Mark Shannon | | Store | ``o.attr = z`` | Similar to load attribute optimization. | 2% | Mark Shannon |
| attribute | | | in pyperformance | | | attribute | | | in pyperformance | |
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
| Unpack | ``*seq`` | Specialized for common containers such as ``list`` | 8% | Brandt Bucher | | Unpack | ``*seq`` | Specialized for common containers such as | 8% | Brandt Bucher |
| Sequence | | and ``tuple``. Avoids internal calling convention. | | | | Sequence | | :class:`list` and :class:`tuple`. | | |
| | | Avoids internal calling convention. | | |
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
.. [1] A similar optimization already existed since Python 3.8. 3.11 .. [#load-global] A similar optimization already existed since Python 3.8.
specializes for more forms and reduces some overhead. 3.11 specializes for more forms and reduces some overhead.
.. [2] A similar optimization already existed since Python 3.10. .. [#load-attr] A similar optimization already existed since Python 3.10.
3.11 specializes for more forms. Furthermore, all attribute loads should 3.11 specializes for more forms. Furthermore, all attribute loads should
be sped up by :issue:`45947`. be sped up by :issue:`45947`.
@ -1501,49 +1510,72 @@ Bucher, with additional help from Irit Katriel and Dennis Sweeney.)
Misc Misc
---- ----
* Objects now require less memory due to lazily created object namespaces. Their * Objects now require less memory due to lazily created object namespaces.
namespace dictionaries now also share keys more freely. Their namespace dictionaries now also share keys more freely.
(Contributed Mark Shannon in :issue:`45340` and :issue:`40116`.) (Contributed Mark Shannon in :issue:`45340` and :issue:`40116`.)
* "Zero-cost" exceptions are implemented, eliminating the cost
of :keyword:`try` statements when no exception is raised.
(Contributed by Mark Shannon in :issue:`40222`.)
* A more concise representation of exceptions in the interpreter reduced the * A more concise representation of exceptions in the interpreter reduced the
time required for catching an exception by about 10%. time required for catching an exception by about 10%.
(Contributed by Irit Katriel in :issue:`45711`.) (Contributed by Irit Katriel in :issue:`45711`.)
* :mod:`re`'s regular expression matching engine has been partially refactored,
and now uses computed gotos (or "threaded code") on supported platforms. As a
result, Python 3.11 executes the `pyperformance regular expression benchmarks
<https://pyperformance.readthedocs.io/benchmarks.html#regex-dna>`_ up to 10%
faster than Python 3.10.
(Contributed by Brandt Bucher in :gh:`91404`.)
.. _whatsnew311-faster-cpython-faq: .. _whatsnew311-faster-cpython-faq:
FAQ FAQ
--- ---
| Q: How should I write my code to utilize these speedups? .. _faster-cpython-faq-my-code:
|
| A: You don't have to change your code. Write Pythonic code that follows common How should I write my code to utilize these speedups?
best practices. The Faster CPython project optimizes for common code ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
patterns we observe.
| Write Pythonic code that follows common best practices;
| you don't have to change your code.
| Q: Will CPython 3.11 use more memory? The Faster CPython project optimizes for common code patterns we observe.
|
| A: Maybe not. We don't expect memory use to exceed 20% more than 3.10.
This is offset by memory optimizations for frame objects and object .. _faster-cpython-faq-memory:
dictionaries as mentioned above.
| Will CPython 3.11 use more memory?
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| Q: I don't see any speedups in my workload. Why?
| Maybe not; we don't expect memory use to exceed 20% higher than 3.10.
| A: Certain code won't have noticeable benefits. If your code spends most of This is offset by memory optimizations for frame objects and object
its time on I/O operations, or already does most of its dictionaries as mentioned above.
computation in a C extension library like numpy, there won't be significant
speedup. This project currently benefits pure-Python workloads the most.
| .. _faster-cpython-ymmv:
| Furthermore, the pyperformance figures are a geometric mean. Even within the
pyperformance benchmarks, certain benchmarks have slowed down slightly, while I don't see any speedups in my workload. Why?
others have sped up by nearly 2x! ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| Certain code won't have noticeable benefits. If your code spends most of
| Q: Is there a JIT compiler? its time on I/O operations, or already does most of its
| computation in a C extension library like NumPy, there won't be significant
| A: No. We're still exploring other optimizations. speedups. This project currently benefits pure-Python workloads the most.
Furthermore, the pyperformance figures are a geometric mean. Even within the
pyperformance benchmarks, certain benchmarks have slowed down slightly, while
others have sped up by nearly 2x!
.. _faster-cpython-jit:
Is there a JIT compiler?
^^^^^^^^^^^^^^^^^^^^^^^^
No. We're still exploring other optimizations.
.. _whatsnew311-faster-cpython-about: .. _whatsnew311-faster-cpython-about: