gh-92536: PEP 623: Remove wstr and legacy APIs from Unicode (GH-92537)

2025-11-25 12:44:13 +00:00 · 2022-05-12 14:48:38 +09:00 · 2022-05-12 14:48:38 +09:00 · f9c9354a7a
commit f9c9354a7a
parent 68fec31364
35 changed files with 199 additions and 2090 deletions
--- a/Doc/c-api/unicode.rst
+++ b/Doc/c-api/unicode.rst
@ -17,26 +17,12 @@ of Unicode characters while staying memory efficient.  There are special cases
 for strings where all code points are below 128, 256, or 65536; otherwise, code
 points must be below 1114112 (which is the full Unicode range).

-:c:type:`Py_UNICODE*` and UTF-8 representations are created on demand and cached
-in the Unicode object.  The :c:type:`Py_UNICODE*` representation is deprecated
-and inefficient.
-
-Due to the transition between the old APIs and the new APIs, Unicode objects
-can internally be in two states depending on how they were created:
-
-* "canonical" Unicode objects are all objects created by a non-deprecated
-  Unicode API.  They use the most efficient representation allowed by the
-  implementation.
-
-* "legacy" Unicode objects have been created through one of the deprecated
-  APIs (typically :c:func:`PyUnicode_FromUnicode`) and only bear the
-  :c:type:`Py_UNICODE*` representation; you will have to call
-  :c:func:`PyUnicode_READY` on them before calling any other API.
+UTF-8 representation is created on demand and cached in the Unicode object.

 .. note::
-   The "legacy" Unicode object will be removed in Python 3.12 with deprecated
-   APIs. All Unicode objects will be "canonical" since then. See :pep:`623`
-   for more information.
+   The :c:type:`Py_UNICODE` representation has been removed since Python 3.12
+   with deprecated APIs.
+   See :pep:`623` for more information.


 Unicode Type
@ -101,18 +87,12 @@ access to internal read-only data of Unicode objects:

 .. c:function:: int PyUnicode_READY(PyObject *o)

-   Ensure the string object *o* is in the "canonical" representation.  This is
-   required before using any of the access macros described below.
-
-   .. XXX expand on when it is not required
-
-   Returns ``0`` on success and ``-1`` with an exception set on failure, which in
-   particular happens if memory allocation fails.
+   Returns ``0``. This API is kept only for backward compatibility.

   .. versionadded:: 3.3

-   .. deprecated-removed:: 3.10 3.12
-      This API will be removed with :c:func:`PyUnicode_FromUnicode`.
+   .. deprecated:: 3.10
+      This API do nothing since Python 3.12. Please remove code using this function.


 .. c:function:: Py_ssize_t PyUnicode_GET_LENGTH(PyObject *o)
@ -130,14 +110,12 @@ access to internal read-only data of Unicode objects:
   Return a pointer to the canonical representation cast to UCS1, UCS2 or UCS4
   integer types for direct character access.  No checks are performed if the
   canonical representation has the correct character size; use
-   :c:func:`PyUnicode_KIND` to select the right function.  Make sure
-   :c:func:`PyUnicode_READY` has been called before accessing this.
+   :c:func:`PyUnicode_KIND` to select the right function.

   .. versionadded:: 3.3


-.. c:macro:: PyUnicode_WCHAR_KIND
-             PyUnicode_1BYTE_KIND
+.. c:macro:: PyUnicode_1BYTE_KIND
             PyUnicode_2BYTE_KIND
             PyUnicode_4BYTE_KIND

@ -145,8 +123,8 @@ access to internal read-only data of Unicode objects:

   .. versionadded:: 3.3

-   .. deprecated-removed:: 3.10 3.12
-      ``PyUnicode_WCHAR_KIND`` is deprecated.
+   .. versionchanged:: 3.12
+      ``PyUnicode_WCHAR_KIND`` has been removed.


 .. c:function:: int PyUnicode_KIND(PyObject *o)
@ -155,8 +133,6 @@ access to internal read-only data of Unicode objects:
   bytes per character this Unicode object uses to store its data.  *o* has to
   be a Unicode object in the "canonical" representation (not checked).

-   .. XXX document "0" return value?
-
   .. versionadded:: 3.3


@ -208,49 +184,6 @@ access to internal read-only data of Unicode objects:
   .. versionadded:: 3.3


-.. c:function:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
-
-   Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
-   code units (this includes surrogate pairs as 2 units).  *o* has to be a
-   Unicode object (not checked).
-
-   .. deprecated-removed:: 3.3 3.12
-      Part of the old-style Unicode API, please migrate to using
-      :c:func:`PyUnicode_GET_LENGTH`.
-
-
-.. c:function:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
-
-   Return the size of the deprecated :c:type:`Py_UNICODE` representation in
-   bytes.  *o* has to be a Unicode object (not checked).
-
-   .. deprecated-removed:: 3.3 3.12
-      Part of the old-style Unicode API, please migrate to using
-      :c:func:`PyUnicode_GET_LENGTH`.
-
-
-.. c:function:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)
-                const char* PyUnicode_AS_DATA(PyObject *o)
-
-   Return a pointer to a :c:type:`Py_UNICODE` representation of the object.  The
-   returned buffer is always terminated with an extra null code point.  It
-   may also contain embedded null code points, which would cause the string
-   to be truncated when used in most C functions.  The ``AS_DATA`` form
-   casts the pointer to :c:type:`const char *`.  The *o* argument has to be
-   a Unicode object (not checked).
-
-   .. versionchanged:: 3.3
-      This function is now inefficient -- because in many cases the
-      :c:type:`Py_UNICODE` representation does not exist and needs to be created
-      -- and can fail (return ``NULL`` with an exception set).  Try to port the
-      code to use the new :c:func:`PyUnicode_nBYTE_DATA` macros or use
-      :c:func:`PyUnicode_WRITE` or :c:func:`PyUnicode_READ`.
-
-   .. deprecated-removed:: 3.3 3.12
-      Part of the old-style Unicode API, please migrate to using the
-      :c:func:`PyUnicode_nBYTE_DATA` family of macros.
-
-
 .. c:function:: int PyUnicode_IsIdentifier(PyObject *o)

   Return ``1`` if the string is a valid identifier according to the language
@ -436,12 +369,17 @@ APIs:

   Create a Unicode object from the char buffer *u*.  The bytes will be
   interpreted as being UTF-8 encoded.  The buffer is copied into the new
-   object. If the buffer is not ``NULL``, the return value might be a shared
-   object, i.e. modification of the data is not allowed.
+   object.
+   The return value might be a shared object, i.e. modification of the data is
+   not allowed.

-   If *u* is ``NULL``, this function behaves like :c:func:`PyUnicode_FromUnicode`
-   with the buffer set to ``NULL``.  This usage is deprecated in favor of
-   :c:func:`PyUnicode_New`, and will be removed in Python 3.12.
+   This function raises :exc:`SystemError` when:
+
+   * *size* < 0,
+   * *u* is ``NULL`` and *size* > 0
+
+   .. versionchanged:: 3.12
+      *u* == ``NULL`` with *size* > 0 is not allowed anymore.


 .. c:function:: PyObject *PyUnicode_FromString(const char *u)
@ -680,79 +618,6 @@ APIs:
   .. versionadded:: 3.3


-Deprecated Py_UNICODE APIs
-""""""""""""""""""""""""""
-
-.. deprecated-removed:: 3.3 3.12
-
-These API functions are deprecated with the implementation of :pep:`393`.
-Extension modules can continue using them, as they will not be removed in Python
-3.x, but need to be aware that their use can now cause performance and memory hits.
-
-
-.. c:function:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
-
-   Create a Unicode object from the Py_UNICODE buffer *u* of the given size. *u*
-   may be ``NULL`` which causes the contents to be undefined. It is the user's
-   responsibility to fill in the needed data.  The buffer is copied into the new
-   object.
-
-   If the buffer is not ``NULL``, the return value might be a shared object.
-   Therefore, modification of the resulting Unicode object is only allowed when
-   *u* is ``NULL``.
-
-   If the buffer is ``NULL``, :c:func:`PyUnicode_READY` must be called once the
-   string content has been filled before using any of the access macros such as
-   :c:func:`PyUnicode_KIND`.
-
-   .. deprecated-removed:: 3.3 3.12
-      Part of the old-style Unicode API, please migrate to using
-      :c:func:`PyUnicode_FromKindAndData`, :c:func:`PyUnicode_FromWideChar`, or
-      :c:func:`PyUnicode_New`.
-
-
-.. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
-
-   Return a read-only pointer to the Unicode object's internal
-   :c:type:`Py_UNICODE` buffer, or ``NULL`` on error. This will create the
-   :c:type:`Py_UNICODE*` representation of the object if it is not yet
-   available. The buffer is always terminated with an extra null code point.
-   Note that the resulting :c:type:`Py_UNICODE` string may also contain
-   embedded null code points, which would cause the string to be truncated when
-   used in most C functions.
-
-   .. deprecated-removed:: 3.3 3.12
-      Part of the old-style Unicode API, please migrate to using
-      :c:func:`PyUnicode_AsUCS4`, :c:func:`PyUnicode_AsWideChar`,
-      :c:func:`PyUnicode_ReadChar` or similar new APIs.
-
-
-.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject *unicode, Py_ssize_t *size)
-
-   Like :c:func:`PyUnicode_AsUnicode`, but also saves the :c:func:`Py_UNICODE`
-   array length (excluding the extra null terminator) in *size*.
-   Note that the resulting :c:type:`Py_UNICODE*` string
-   may contain embedded null code points, which would cause the string to be
-   truncated when used in most C functions.
-
-   .. versionadded:: 3.3
-
-   .. deprecated-removed:: 3.3 3.12
-      Part of the old-style Unicode API, please migrate to using
-      :c:func:`PyUnicode_AsUCS4`, :c:func:`PyUnicode_AsWideChar`,
-      :c:func:`PyUnicode_ReadChar` or similar new APIs.
-
-
-.. c:function:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
-
-   Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
-   code units (this includes surrogate pairs as 2 units).
-
-   .. deprecated-removed:: 3.3 3.12
-      Part of the old-style Unicode API, please migrate to using
-      :c:func:`PyUnicode_GET_LENGTH`.
-
-
 .. c:function:: PyObject* PyUnicode_FromObject(PyObject *obj)

   Copy an instance of a Unicode subtype to a new true Unicode object if