mirror of
https://github.com/python/cpython.git
synced 2025-11-25 12:44:13 +00:00
gh-92536: PEP 623: Remove wstr and legacy APIs from Unicode (GH-92537)
This commit is contained in:
parent
68fec31364
commit
f9c9354a7a
35 changed files with 199 additions and 2090 deletions
|
|
@ -17,26 +17,12 @@ of Unicode characters while staying memory efficient. There are special cases
|
|||
for strings where all code points are below 128, 256, or 65536; otherwise, code
|
||||
points must be below 1114112 (which is the full Unicode range).
|
||||
|
||||
:c:type:`Py_UNICODE*` and UTF-8 representations are created on demand and cached
|
||||
in the Unicode object. The :c:type:`Py_UNICODE*` representation is deprecated
|
||||
and inefficient.
|
||||
|
||||
Due to the transition between the old APIs and the new APIs, Unicode objects
|
||||
can internally be in two states depending on how they were created:
|
||||
|
||||
* "canonical" Unicode objects are all objects created by a non-deprecated
|
||||
Unicode API. They use the most efficient representation allowed by the
|
||||
implementation.
|
||||
|
||||
* "legacy" Unicode objects have been created through one of the deprecated
|
||||
APIs (typically :c:func:`PyUnicode_FromUnicode`) and only bear the
|
||||
:c:type:`Py_UNICODE*` representation; you will have to call
|
||||
:c:func:`PyUnicode_READY` on them before calling any other API.
|
||||
UTF-8 representation is created on demand and cached in the Unicode object.
|
||||
|
||||
.. note::
|
||||
The "legacy" Unicode object will be removed in Python 3.12 with deprecated
|
||||
APIs. All Unicode objects will be "canonical" since then. See :pep:`623`
|
||||
for more information.
|
||||
The :c:type:`Py_UNICODE` representation has been removed since Python 3.12
|
||||
with deprecated APIs.
|
||||
See :pep:`623` for more information.
|
||||
|
||||
|
||||
Unicode Type
|
||||
|
|
@ -101,18 +87,12 @@ access to internal read-only data of Unicode objects:
|
|||
|
||||
.. c:function:: int PyUnicode_READY(PyObject *o)
|
||||
|
||||
Ensure the string object *o* is in the "canonical" representation. This is
|
||||
required before using any of the access macros described below.
|
||||
|
||||
.. XXX expand on when it is not required
|
||||
|
||||
Returns ``0`` on success and ``-1`` with an exception set on failure, which in
|
||||
particular happens if memory allocation fails.
|
||||
Returns ``0``. This API is kept only for backward compatibility.
|
||||
|
||||
.. versionadded:: 3.3
|
||||
|
||||
.. deprecated-removed:: 3.10 3.12
|
||||
This API will be removed with :c:func:`PyUnicode_FromUnicode`.
|
||||
.. deprecated:: 3.10
|
||||
This API do nothing since Python 3.12. Please remove code using this function.
|
||||
|
||||
|
||||
.. c:function:: Py_ssize_t PyUnicode_GET_LENGTH(PyObject *o)
|
||||
|
|
@ -130,14 +110,12 @@ access to internal read-only data of Unicode objects:
|
|||
Return a pointer to the canonical representation cast to UCS1, UCS2 or UCS4
|
||||
integer types for direct character access. No checks are performed if the
|
||||
canonical representation has the correct character size; use
|
||||
:c:func:`PyUnicode_KIND` to select the right function. Make sure
|
||||
:c:func:`PyUnicode_READY` has been called before accessing this.
|
||||
:c:func:`PyUnicode_KIND` to select the right function.
|
||||
|
||||
.. versionadded:: 3.3
|
||||
|
||||
|
||||
.. c:macro:: PyUnicode_WCHAR_KIND
|
||||
PyUnicode_1BYTE_KIND
|
||||
.. c:macro:: PyUnicode_1BYTE_KIND
|
||||
PyUnicode_2BYTE_KIND
|
||||
PyUnicode_4BYTE_KIND
|
||||
|
||||
|
|
@ -145,8 +123,8 @@ access to internal read-only data of Unicode objects:
|
|||
|
||||
.. versionadded:: 3.3
|
||||
|
||||
.. deprecated-removed:: 3.10 3.12
|
||||
``PyUnicode_WCHAR_KIND`` is deprecated.
|
||||
.. versionchanged:: 3.12
|
||||
``PyUnicode_WCHAR_KIND`` has been removed.
|
||||
|
||||
|
||||
.. c:function:: int PyUnicode_KIND(PyObject *o)
|
||||
|
|
@ -155,8 +133,6 @@ access to internal read-only data of Unicode objects:
|
|||
bytes per character this Unicode object uses to store its data. *o* has to
|
||||
be a Unicode object in the "canonical" representation (not checked).
|
||||
|
||||
.. XXX document "0" return value?
|
||||
|
||||
.. versionadded:: 3.3
|
||||
|
||||
|
||||
|
|
@ -208,49 +184,6 @@ access to internal read-only data of Unicode objects:
|
|||
.. versionadded:: 3.3
|
||||
|
||||
|
||||
.. c:function:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
|
||||
|
||||
Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
|
||||
code units (this includes surrogate pairs as 2 units). *o* has to be a
|
||||
Unicode object (not checked).
|
||||
|
||||
.. deprecated-removed:: 3.3 3.12
|
||||
Part of the old-style Unicode API, please migrate to using
|
||||
:c:func:`PyUnicode_GET_LENGTH`.
|
||||
|
||||
|
||||
.. c:function:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
|
||||
|
||||
Return the size of the deprecated :c:type:`Py_UNICODE` representation in
|
||||
bytes. *o* has to be a Unicode object (not checked).
|
||||
|
||||
.. deprecated-removed:: 3.3 3.12
|
||||
Part of the old-style Unicode API, please migrate to using
|
||||
:c:func:`PyUnicode_GET_LENGTH`.
|
||||
|
||||
|
||||
.. c:function:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)
|
||||
const char* PyUnicode_AS_DATA(PyObject *o)
|
||||
|
||||
Return a pointer to a :c:type:`Py_UNICODE` representation of the object. The
|
||||
returned buffer is always terminated with an extra null code point. It
|
||||
may also contain embedded null code points, which would cause the string
|
||||
to be truncated when used in most C functions. The ``AS_DATA`` form
|
||||
casts the pointer to :c:type:`const char *`. The *o* argument has to be
|
||||
a Unicode object (not checked).
|
||||
|
||||
.. versionchanged:: 3.3
|
||||
This function is now inefficient -- because in many cases the
|
||||
:c:type:`Py_UNICODE` representation does not exist and needs to be created
|
||||
-- and can fail (return ``NULL`` with an exception set). Try to port the
|
||||
code to use the new :c:func:`PyUnicode_nBYTE_DATA` macros or use
|
||||
:c:func:`PyUnicode_WRITE` or :c:func:`PyUnicode_READ`.
|
||||
|
||||
.. deprecated-removed:: 3.3 3.12
|
||||
Part of the old-style Unicode API, please migrate to using the
|
||||
:c:func:`PyUnicode_nBYTE_DATA` family of macros.
|
||||
|
||||
|
||||
.. c:function:: int PyUnicode_IsIdentifier(PyObject *o)
|
||||
|
||||
Return ``1`` if the string is a valid identifier according to the language
|
||||
|
|
@ -436,12 +369,17 @@ APIs:
|
|||
|
||||
Create a Unicode object from the char buffer *u*. The bytes will be
|
||||
interpreted as being UTF-8 encoded. The buffer is copied into the new
|
||||
object. If the buffer is not ``NULL``, the return value might be a shared
|
||||
object, i.e. modification of the data is not allowed.
|
||||
object.
|
||||
The return value might be a shared object, i.e. modification of the data is
|
||||
not allowed.
|
||||
|
||||
If *u* is ``NULL``, this function behaves like :c:func:`PyUnicode_FromUnicode`
|
||||
with the buffer set to ``NULL``. This usage is deprecated in favor of
|
||||
:c:func:`PyUnicode_New`, and will be removed in Python 3.12.
|
||||
This function raises :exc:`SystemError` when:
|
||||
|
||||
* *size* < 0,
|
||||
* *u* is ``NULL`` and *size* > 0
|
||||
|
||||
.. versionchanged:: 3.12
|
||||
*u* == ``NULL`` with *size* > 0 is not allowed anymore.
|
||||
|
||||
|
||||
.. c:function:: PyObject *PyUnicode_FromString(const char *u)
|
||||
|
|
@ -680,79 +618,6 @@ APIs:
|
|||
.. versionadded:: 3.3
|
||||
|
||||
|
||||
Deprecated Py_UNICODE APIs
|
||||
""""""""""""""""""""""""""
|
||||
|
||||
.. deprecated-removed:: 3.3 3.12
|
||||
|
||||
These API functions are deprecated with the implementation of :pep:`393`.
|
||||
Extension modules can continue using them, as they will not be removed in Python
|
||||
3.x, but need to be aware that their use can now cause performance and memory hits.
|
||||
|
||||
|
||||
.. c:function:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
|
||||
|
||||
Create a Unicode object from the Py_UNICODE buffer *u* of the given size. *u*
|
||||
may be ``NULL`` which causes the contents to be undefined. It is the user's
|
||||
responsibility to fill in the needed data. The buffer is copied into the new
|
||||
object.
|
||||
|
||||
If the buffer is not ``NULL``, the return value might be a shared object.
|
||||
Therefore, modification of the resulting Unicode object is only allowed when
|
||||
*u* is ``NULL``.
|
||||
|
||||
If the buffer is ``NULL``, :c:func:`PyUnicode_READY` must be called once the
|
||||
string content has been filled before using any of the access macros such as
|
||||
:c:func:`PyUnicode_KIND`.
|
||||
|
||||
.. deprecated-removed:: 3.3 3.12
|
||||
Part of the old-style Unicode API, please migrate to using
|
||||
:c:func:`PyUnicode_FromKindAndData`, :c:func:`PyUnicode_FromWideChar`, or
|
||||
:c:func:`PyUnicode_New`.
|
||||
|
||||
|
||||
.. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
|
||||
|
||||
Return a read-only pointer to the Unicode object's internal
|
||||
:c:type:`Py_UNICODE` buffer, or ``NULL`` on error. This will create the
|
||||
:c:type:`Py_UNICODE*` representation of the object if it is not yet
|
||||
available. The buffer is always terminated with an extra null code point.
|
||||
Note that the resulting :c:type:`Py_UNICODE` string may also contain
|
||||
embedded null code points, which would cause the string to be truncated when
|
||||
used in most C functions.
|
||||
|
||||
.. deprecated-removed:: 3.3 3.12
|
||||
Part of the old-style Unicode API, please migrate to using
|
||||
:c:func:`PyUnicode_AsUCS4`, :c:func:`PyUnicode_AsWideChar`,
|
||||
:c:func:`PyUnicode_ReadChar` or similar new APIs.
|
||||
|
||||
|
||||
.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject *unicode, Py_ssize_t *size)
|
||||
|
||||
Like :c:func:`PyUnicode_AsUnicode`, but also saves the :c:func:`Py_UNICODE`
|
||||
array length (excluding the extra null terminator) in *size*.
|
||||
Note that the resulting :c:type:`Py_UNICODE*` string
|
||||
may contain embedded null code points, which would cause the string to be
|
||||
truncated when used in most C functions.
|
||||
|
||||
.. versionadded:: 3.3
|
||||
|
||||
.. deprecated-removed:: 3.3 3.12
|
||||
Part of the old-style Unicode API, please migrate to using
|
||||
:c:func:`PyUnicode_AsUCS4`, :c:func:`PyUnicode_AsWideChar`,
|
||||
:c:func:`PyUnicode_ReadChar` or similar new APIs.
|
||||
|
||||
|
||||
.. c:function:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
|
||||
|
||||
Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
|
||||
code units (this includes surrogate pairs as 2 units).
|
||||
|
||||
.. deprecated-removed:: 3.3 3.12
|
||||
Part of the old-style Unicode API, please migrate to using
|
||||
:c:func:`PyUnicode_GET_LENGTH`.
|
||||
|
||||
|
||||
.. c:function:: PyObject* PyUnicode_FromObject(PyObject *obj)
|
||||
|
||||
Copy an instance of a Unicode subtype to a new true Unicode object if
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue