#6930: clarify description about byteorder handling in UTF decoder routines.

This commit is contained in:
Georg Brandl 2009-09-18 21:35:59 +00:00
parent 54967d994a
commit 579a358e61

View file

@ -414,10 +414,13 @@ These are the UTF-32 codec APIs:
*byteorder == 0: native order *byteorder == 0: native order
*byteorder == 1: big endian *byteorder == 1: big endian
and then switches if the first four bytes of the input data are a byte order mark If ``*byteorder`` is zero, and the first four bytes of the input data are a
(BOM) and the specified byte order is native order. This BOM is not copied into byte order mark (BOM), the decoder switches to this byte order and the BOM is
the resulting Unicode string. After completion, *\*byteorder* is set to the not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
current byte order at the end of input data. ``1``, any byte order mark is copied to the output.
After completion, *\*byteorder* is set to the current byte order at the end
of input data.
In a narrow build codepoints outside the BMP will be decoded as surrogate pairs. In a narrow build codepoints outside the BMP will be decoded as surrogate pairs.
@ -442,8 +445,7 @@ These are the UTF-32 codec APIs:
.. cfunction:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder) .. cfunction:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder)
Return a Python bytes object holding the UTF-32 encoded value of the Unicode Return a Python bytes object holding the UTF-32 encoded value of the Unicode
data in *s*. If *byteorder* is not ``0``, output is written according to the data in *s*. Output is written according to the following byte order::
following byte order::
byteorder == -1: little endian byteorder == -1: little endian
byteorder == 0: native byte order (writes a BOM mark) byteorder == 0: native byte order (writes a BOM mark)
@ -487,10 +489,14 @@ These are the UTF-16 codec APIs:
*byteorder == 0: native order *byteorder == 0: native order
*byteorder == 1: big endian *byteorder == 1: big endian
and then switches if the first two bytes of the input data are a byte order mark If ``*byteorder`` is zero, and the first two bytes of the input data are a
(BOM) and the specified byte order is native order. This BOM is not copied into byte order mark (BOM), the decoder switches to this byte order and the BOM is
the resulting Unicode string. After completion, *\*byteorder* is set to the not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
current byte order at the. ``1``, any byte order mark is copied to the output (where it will result in
either a ``\ufeff`` or a ``\ufffe`` character).
After completion, *\*byteorder* is set to the current byte order at the end
of input data.
If *byteorder* is *NULL*, the codec starts in native order mode. If *byteorder* is *NULL*, the codec starts in native order mode.
@ -520,8 +526,7 @@ These are the UTF-16 codec APIs:
.. cfunction:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder) .. cfunction:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder)
Return a Python string object holding the UTF-16 encoded value of the Unicode Return a Python string object holding the UTF-16 encoded value of the Unicode
data in *s*. If *byteorder* is not ``0``, output is written according to the data in *s*. Output is written according to the following byte order::
following byte order::
byteorder == -1: little endian byteorder == -1: little endian
byteorder == 0: native byte order (writes a BOM mark) byteorder == 0: native byte order (writes a BOM mark)