cpython/Doc/c-api/extension-modules.rst
Rafael Fontenelle 6a16b3c440
Docs: Remove unnecessary trailing backslashes (GH-135781)
This fixes Sphinx's gettext extraction for translations.
2025-06-21 09:01:14 -04:00

247 lines
9 KiB
ReStructuredText

.. highlight:: c
.. _extension-modules:
Defining extension modules
--------------------------
A C extension for CPython is a shared library (for example, a ``.so`` file
on Linux, ``.pyd`` DLL on Windows), which is loadable into the Python process
(for example, it is compiled with compatible compiler settings), and which
exports an :ref:`initialization function <extension-export-hook>`.
To be importable by default (that is, by
:py:class:`importlib.machinery.ExtensionFileLoader`),
the shared library must be available on :py:attr:`sys.path`,
and must be named after the module name plus an extension listed in
:py:attr:`importlib.machinery.EXTENSION_SUFFIXES`.
.. note::
Building, packaging and distributing extension modules is best done with
third-party tools, and is out of scope of this document.
One suitable tool is Setuptools, whose documentation can be found at
https://setuptools.pypa.io/en/latest/setuptools.html.
Normally, the initialization function returns a module definition initialized
using :c:func:`PyModuleDef_Init`.
This allows splitting the creation process into several phases:
- Before any substantial code is executed, Python can determine which
capabilities the module supports, and it can adjust the environment or
refuse loading an incompatible extension.
- By default, Python itself creates the module object -- that is, it does
the equivalent of :py:meth:`object.__new__` for classes.
It also sets initial attributes like :attr:`~module.__package__` and
:attr:`~module.__loader__`.
- Afterwards, the module object is initialized using extension-specific
code -- the equivalent of :py:meth:`~object.__init__` on classes.
This is called *multi-phase initialization* to distinguish it from the legacy
(but still supported) *single-phase initialization* scheme,
where the initialization function returns a fully constructed module.
See the :ref:`single-phase-initialization section below <single-phase-initialization>`
for details.
.. versionchanged:: 3.5
Added support for multi-phase initialization (:pep:`489`).
Multiple module instances
.........................
By default, extension modules are not singletons.
For example, if the :py:attr:`sys.modules` entry is removed and the module
is re-imported, a new module object is created, and typically populated with
fresh method and type objects.
The old module is subject to normal garbage collection.
This mirrors the behavior of pure-Python modules.
Additional module instances may be created in
:ref:`sub-interpreters <sub-interpreter-support>`
or after Python runtime reinitialization
(:c:func:`Py_Finalize` and :c:func:`Py_Initialize`).
In these cases, sharing Python objects between module instances would likely
cause crashes or undefined behavior.
To avoid such issues, each instance of an extension module should
be *isolated*: changes to one instance should not implicitly affect the others,
and all state owned by the module, including references to Python objects,
should be specific to a particular module instance.
See :ref:`isolating-extensions-howto` for more details and a practical guide.
A simpler way to avoid these issues is
:ref:`raising an error on repeated initialization <isolating-extensions-optout>`.
All modules are expected to support
:ref:`sub-interpreters <sub-interpreter-support>`, or otherwise explicitly
signal a lack of support.
This is usually achieved by isolation or blocking repeated initialization,
as above.
A module may also be limited to the main interpreter using
the :c:data:`Py_mod_multiple_interpreters` slot.
.. _extension-export-hook:
Initialization function
.......................
The initialization function defined by an extension module has the
following signature:
.. c:function:: PyObject* PyInit_modulename(void)
Its name should be :samp:`PyInit_{<name>}`, with ``<name>`` replaced by the
name of the module.
For modules with ASCII-only names, the function must instead be named
:samp:`PyInit_{<name>}`, with ``<name>`` replaced by the name of the module.
When using :ref:`multi-phase-initialization`, non-ASCII module names
are allowed. In this case, the initialization function name is
:samp:`PyInitU_{<name>}`, with ``<name>`` encoded using Python's
*punycode* encoding with hyphens replaced by underscores. In Python:
.. code-block:: python
def initfunc_name(name):
try:
suffix = b'_' + name.encode('ascii')
except UnicodeEncodeError:
suffix = b'U_' + name.encode('punycode').replace(b'-', b'_')
return b'PyInit' + suffix
It is recommended to define the initialization function using a helper macro:
.. c:macro:: PyMODINIT_FUNC
Declare an extension module initialization function.
This macro:
* specifies the :c:expr:`PyObject*` return type,
* adds any special linkage declarations required by the platform, and
* for C++, declares the function as ``extern "C"``.
For example, a module called ``spam`` would be defined like this::
static struct PyModuleDef spam_module = {
.m_base = PyModuleDef_HEAD_INIT,
.m_name = "spam",
...
};
PyMODINIT_FUNC
PyInit_spam(void)
{
return PyModuleDef_Init(&spam_module);
}
It is possible to export multiple modules from a single shared library by
defining multiple initialization functions. However, importing them requires
using symbolic links or a custom importer, because by default only the
function corresponding to the filename is found.
See the `Multiple modules in one library <https://peps.python.org/pep-0489/#multiple-modules-in-one-library>`__
section in :pep:`489` for details.
The initialization function is typically the only non-\ ``static``
item defined in the module's C source.
.. _multi-phase-initialization:
Multi-phase initialization
..........................
Normally, the :ref:`initialization function <extension-export-hook>`
(``PyInit_modulename``) returns a :c:type:`PyModuleDef` instance with
non-``NULL`` :c:member:`~PyModuleDef.m_slots`.
Before it is returned, the ``PyModuleDef`` instance must be initialized
using the following function:
.. c:function:: PyObject* PyModuleDef_Init(PyModuleDef *def)
Ensure a module definition is a properly initialized Python object that
correctly reports its type and a reference count.
Return *def* cast to ``PyObject*``, or ``NULL`` if an error occurred.
Calling this function is required for :ref:`multi-phase-initialization`.
It should not be used in other contexts.
Note that Python assumes that ``PyModuleDef`` structures are statically
allocated.
This function may return either a new reference or a borrowed one;
this reference must not be released.
.. versionadded:: 3.5
.. _single-phase-initialization:
Legacy single-phase initialization
..................................
.. attention::
Single-phase initialization is a legacy mechanism to initialize extension
modules, with known drawbacks and design flaws. Extension module authors
are encouraged to use multi-phase initialization instead.
In single-phase initialization, the
:ref:`initialization function <extension-export-hook>` (``PyInit_modulename``)
should create, populate and return a module object.
This is typically done using :c:func:`PyModule_Create` and functions like
:c:func:`PyModule_AddObjectRef`.
Single-phase initialization differs from the :ref:`default <multi-phase-initialization>`
in the following ways:
* Single-phase modules are, or rather *contain*, “singletons”.
When the module is first initialized, Python saves the contents of
the module's ``__dict__`` (that is, typically, the module's functions and
types).
For subsequent imports, Python does not call the initialization function
again.
Instead, it creates a new module object with a new ``__dict__``, and copies
the saved contents to it.
For example, given a single-phase module ``_testsinglephase``
[#testsinglephase]_ that defines a function ``sum`` and an exception class
``error``:
.. code-block:: python
>>> import sys
>>> import _testsinglephase as one
>>> del sys.modules['_testsinglephase']
>>> import _testsinglephase as two
>>> one is two
False
>>> one.__dict__ is two.__dict__
False
>>> one.sum is two.sum
True
>>> one.error is two.error
True
The exact behavior should be considered a CPython implementation detail.
* To work around the fact that ``PyInit_modulename`` does not take a *spec*
argument, some state of the import machinery is saved and applied to the
first suitable module created during the ``PyInit_modulename`` call.
Specifically, when a sub-module is imported, this mechanism prepends the
parent package name to the name of the module.
A single-phase ``PyInit_modulename`` function should create “its” module
object as soon as possible, before any other module objects can be created.
* Non-ASCII module names (``PyInitU_modulename``) are not supported.
* Single-phase modules support module lookup functions like
:c:func:`PyState_FindModule`.
.. [#testsinglephase] ``_testsinglephase`` is an internal module used
in CPython's self-test suite; your installation may or may not
include it.