mirror of
https://github.com/python/cpython.git
synced 2025-09-24 17:33:29 +00:00
gh-46236: PyUnicode docs improvements (GH-129966)
Move deprecated PyUnicode API docs to new section Move Py_UNICODE to a new "Deprecated API" section. Formally soft-deprecate PyUnicode_READY, and move it Document and soft-deprecate PyUnicode_IS_READY, and move it Document PyUnicode_IS_ASCII, PyUnicode_CHECK_INTERNED PyUnicode_New docs: Clarify requirements for "fresh" strings PyUnicodeWriter_DecodeUTF8Stateful: Link "error-handlers" Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
This commit is contained in:
parent
9f0879baf1
commit
e21863ce78
2 changed files with 122 additions and 48 deletions
|
@ -31,6 +31,12 @@ Unicode Type
|
|||
These are the basic Unicode object types used for the Unicode implementation in
|
||||
Python:
|
||||
|
||||
.. c:var:: PyTypeObject PyUnicode_Type
|
||||
|
||||
This instance of :c:type:`PyTypeObject` represents the Python Unicode type. It
|
||||
is exposed to Python code as :py:class:`str`.
|
||||
|
||||
|
||||
.. c:type:: Py_UCS4
|
||||
Py_UCS2
|
||||
Py_UCS1
|
||||
|
@ -42,19 +48,6 @@ Python:
|
|||
.. versionadded:: 3.3
|
||||
|
||||
|
||||
.. c:type:: Py_UNICODE
|
||||
|
||||
This is a typedef of :c:type:`wchar_t`, which is a 16-bit type or 32-bit type
|
||||
depending on the platform.
|
||||
|
||||
.. versionchanged:: 3.3
|
||||
In previous versions, this was a 16-bit type or a 32-bit type depending on
|
||||
whether you selected a "narrow" or "wide" Unicode version of Python at
|
||||
build time.
|
||||
|
||||
.. deprecated-removed:: 3.13 3.15
|
||||
|
||||
|
||||
.. c:type:: PyASCIIObject
|
||||
PyCompactUnicodeObject
|
||||
PyUnicodeObject
|
||||
|
@ -66,12 +59,6 @@ Python:
|
|||
.. versionadded:: 3.3
|
||||
|
||||
|
||||
.. c:var:: PyTypeObject PyUnicode_Type
|
||||
|
||||
This instance of :c:type:`PyTypeObject` represents the Python Unicode type. It
|
||||
is exposed to Python code as ``str``.
|
||||
|
||||
|
||||
The following APIs are C macros and static inlined functions for fast checks and
|
||||
access to internal read-only data of Unicode objects:
|
||||
|
||||
|
@ -87,16 +74,6 @@ access to internal read-only data of Unicode objects:
|
|||
subtype. This function always succeeds.
|
||||
|
||||
|
||||
.. c:function:: int PyUnicode_READY(PyObject *unicode)
|
||||
|
||||
Returns ``0``. This API is kept only for backward compatibility.
|
||||
|
||||
.. versionadded:: 3.3
|
||||
|
||||
.. deprecated:: 3.10
|
||||
This API does nothing since Python 3.12.
|
||||
|
||||
|
||||
.. c:function:: Py_ssize_t PyUnicode_GET_LENGTH(PyObject *unicode)
|
||||
|
||||
Return the length of the Unicode string, in code points. *unicode* has to be a
|
||||
|
@ -149,12 +126,16 @@ access to internal read-only data of Unicode objects:
|
|||
.. c:function:: void PyUnicode_WRITE(int kind, void *data, \
|
||||
Py_ssize_t index, Py_UCS4 value)
|
||||
|
||||
Write into a canonical representation *data* (as obtained with
|
||||
:c:func:`PyUnicode_DATA`). This function performs no sanity checks, and is
|
||||
intended for usage in loops. The caller should cache the *kind* value and
|
||||
*data* pointer as obtained from other calls. *index* is the index in
|
||||
the string (starts at 0) and *value* is the new code point value which should
|
||||
be written to that location.
|
||||
Write the code point *value* to the given zero-based *index* in a string.
|
||||
|
||||
The *kind* value and *data* pointer must have been obtained from a
|
||||
string using :c:func:`PyUnicode_KIND` and :c:func:`PyUnicode_DATA`
|
||||
respectively. You must hold a reference to that string while calling
|
||||
:c:func:`!PyUnicode_WRITE`. All requirements of
|
||||
:c:func:`PyUnicode_WriteChar` also apply.
|
||||
|
||||
The function performs no checks for any of its requirements,
|
||||
and is intended for usage in loops.
|
||||
|
||||
.. versionadded:: 3.3
|
||||
|
||||
|
@ -196,6 +177,14 @@ access to internal read-only data of Unicode objects:
|
|||
is not ready.
|
||||
|
||||
|
||||
.. c:function:: unsigned int PyUnicode_IS_ASCII(PyObject *unicode)
|
||||
|
||||
Return true if the string only contains ASCII characters.
|
||||
Equivalent to :py:meth:`str.isascii`.
|
||||
|
||||
.. versionadded:: 3.2
|
||||
|
||||
|
||||
Unicode Character Properties
|
||||
""""""""""""""""""""""""""""
|
||||
|
||||
|
@ -330,11 +319,29 @@ APIs:
|
|||
to be placed in the string. As an approximation, it can be rounded up to the
|
||||
nearest value in the sequence 127, 255, 65535, 1114111.
|
||||
|
||||
This is the recommended way to allocate a new Unicode object. Objects
|
||||
created using this function are not resizable.
|
||||
|
||||
On error, set an exception and return ``NULL``.
|
||||
|
||||
After creation, the string can be filled by :c:func:`PyUnicode_WriteChar`,
|
||||
:c:func:`PyUnicode_CopyCharacters`, :c:func:`PyUnicode_Fill`,
|
||||
:c:func:`PyUnicode_WRITE` or similar.
|
||||
Since strings are supposed to be immutable, take care to not “use” the
|
||||
result while it is being modified. In particular, before it's filled
|
||||
with its final contents, a string:
|
||||
|
||||
- must not be hashed,
|
||||
- must not be :c:func:`converted to UTF-8 <PyUnicode_AsUTF8AndSize>`,
|
||||
or another non-"canonical" representation,
|
||||
- must not have its reference count changed,
|
||||
- must not be shared with code that might do one of the above.
|
||||
|
||||
This list is not exhaustive. Avoiding these uses is your responsibility;
|
||||
Python does not always check these requirements.
|
||||
|
||||
To avoid accidentally exposing a partially-written string object, prefer
|
||||
using the :c:type:`PyUnicodeWriter` API, or one of the ``PyUnicode_From*``
|
||||
functions below.
|
||||
|
||||
|
||||
.. versionadded:: 3.3
|
||||
|
||||
|
||||
|
@ -636,6 +643,9 @@ APIs:
|
|||
possible. Returns ``-1`` and sets an exception on error, otherwise returns
|
||||
the number of copied characters.
|
||||
|
||||
The string must not have been “used” yet.
|
||||
See :c:func:`PyUnicode_New` for details.
|
||||
|
||||
.. versionadded:: 3.3
|
||||
|
||||
|
||||
|
@ -648,6 +658,9 @@ APIs:
|
|||
Fail if *fill_char* is bigger than the string maximum character, or if the
|
||||
string has more than 1 reference.
|
||||
|
||||
The string must not have been “used” yet.
|
||||
See :c:func:`PyUnicode_New` for details.
|
||||
|
||||
Return the number of written character, or return ``-1`` and raise an
|
||||
exception on error.
|
||||
|
||||
|
@ -657,15 +670,16 @@ APIs:
|
|||
.. c:function:: int PyUnicode_WriteChar(PyObject *unicode, Py_ssize_t index, \
|
||||
Py_UCS4 character)
|
||||
|
||||
Write a character to a string. The string must have been created through
|
||||
:c:func:`PyUnicode_New`. Since Unicode strings are supposed to be immutable,
|
||||
the string must not be shared, or have been hashed yet.
|
||||
Write a *character* to the string *unicode* at the zero-based *index*.
|
||||
Return ``0`` on success, ``-1`` on error with an exception set.
|
||||
|
||||
This function checks that *unicode* is a Unicode object, that the index is
|
||||
not out of bounds, and that the object can be modified safely (i.e. that it
|
||||
its reference count is one).
|
||||
not out of bounds, and that the object's reference count is one).
|
||||
See :c:func:`PyUnicode_WRITE` for a version that skips these checks,
|
||||
making them your responsibility.
|
||||
|
||||
Return ``0`` on success, ``-1`` on error with an exception set.
|
||||
The string must not have been “used” yet.
|
||||
See :c:func:`PyUnicode_New` for details.
|
||||
|
||||
.. versionadded:: 3.3
|
||||
|
||||
|
@ -1649,6 +1663,20 @@ They all return ``NULL`` or ``-1`` if an exception occurs.
|
|||
Strings interned this way are made :term:`immortal`.
|
||||
|
||||
|
||||
.. c:function:: unsigned int PyUnicode_CHECK_INTERNED(PyObject *str)
|
||||
|
||||
Return a non-zero value if *str* is interned, zero if not.
|
||||
The *str* argument must be a string; this is not checked.
|
||||
This function always succeeds.
|
||||
|
||||
.. impl-detail::
|
||||
|
||||
A non-zero return value may carry additional information
|
||||
about *how* the string is interned.
|
||||
The meaning of such non-zero values, as well as each specific string's
|
||||
intern-related details, may change between CPython versions.
|
||||
|
||||
|
||||
PyUnicodeWriter
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
|
@ -1769,8 +1797,8 @@ object.
|
|||
*size* is the string length in bytes. If *size* is equal to ``-1``, call
|
||||
``strlen(str)`` to get the string length.
|
||||
|
||||
*errors* is an error handler name, such as ``"replace"``. If *errors* is
|
||||
``NULL``, use the strict error handler.
|
||||
*errors* is an :ref:`error handler <error-handlers>` name, such as
|
||||
``"replace"``. If *errors* is ``NULL``, use the strict error handler.
|
||||
|
||||
If *consumed* is not ``NULL``, set *\*consumed* to the number of decoded
|
||||
bytes on success.
|
||||
|
@ -1781,3 +1809,49 @@ object.
|
|||
On error, set an exception, leave the writer unchanged, and return ``-1``.
|
||||
|
||||
See also :c:func:`PyUnicodeWriter_WriteUTF8`.
|
||||
|
||||
Deprecated API
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
The following API is deprecated.
|
||||
|
||||
.. c:type:: Py_UNICODE
|
||||
|
||||
This is a typedef of :c:type:`wchar_t`, which is a 16-bit type or 32-bit type
|
||||
depending on the platform.
|
||||
Please use :c:type:`wchar_t` directly instead.
|
||||
|
||||
.. versionchanged:: 3.3
|
||||
In previous versions, this was a 16-bit type or a 32-bit type depending on
|
||||
whether you selected a "narrow" or "wide" Unicode version of Python at
|
||||
build time.
|
||||
|
||||
.. deprecated-removed:: 3.13 3.15
|
||||
|
||||
|
||||
.. c:function:: int PyUnicode_READY(PyObject *unicode)
|
||||
|
||||
Do nothing and return ``0``.
|
||||
This API is kept only for backward compatibility, but there are no plans
|
||||
to remove it.
|
||||
|
||||
.. versionadded:: 3.3
|
||||
|
||||
.. deprecated:: 3.10
|
||||
This API does nothing since Python 3.12.
|
||||
Previously, this needed to be called for each string created using
|
||||
the old API (:c:func:`!PyUnicode_FromUnicode` or similar).
|
||||
|
||||
|
||||
.. c:function:: unsigned int PyUnicode_IS_READY(PyObject *unicode)
|
||||
|
||||
Do nothing and return ``1``.
|
||||
This API is kept only for backward compatibility, but there are no plans
|
||||
to remove it.
|
||||
|
||||
.. versionadded:: 3.3
|
||||
|
||||
.. deprecated:: next
|
||||
This API does nothing since Python 3.12.
|
||||
Previously, this could be called to check if
|
||||
:c:func:`PyUnicode_READY` is necessary.
|
||||
|
|
|
@ -205,7 +205,7 @@ static inline unsigned int PyUnicode_CHECK_INTERNED(PyObject *op) {
|
|||
}
|
||||
#define PyUnicode_CHECK_INTERNED(op) PyUnicode_CHECK_INTERNED(_PyObject_CAST(op))
|
||||
|
||||
/* For backward compatibility */
|
||||
/* For backward compatibility. Soft-deprecated. */
|
||||
static inline unsigned int PyUnicode_IS_READY(PyObject* Py_UNUSED(op)) {
|
||||
return 1;
|
||||
}
|
||||
|
@ -398,7 +398,7 @@ PyAPI_FUNC(PyObject*) PyUnicode_New(
|
|||
Py_UCS4 maxchar /* maximum code point value in the string */
|
||||
);
|
||||
|
||||
/* For backward compatibility */
|
||||
/* For backward compatibility. Soft-deprecated. */
|
||||
static inline int PyUnicode_READY(PyObject* Py_UNUSED(op))
|
||||
{
|
||||
return 0;
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue