mirror of
https://github.com/python/cpython.git
synced 2025-09-26 18:29:57 +00:00
Issue #8711: Document PyUnicode_DecodeFSDefault*() functions
* Add paragraph titles to c-api/unicode.rst. * Fix PyUnicode_DecodeFSDefault*() comment: it now uses the "surrogateescape" error handler (and not "replace") * Remove "The function is intended to be used for paths and file names only during bootstrapping process where the codecs are not set up." from PyUnicode_FSConverter() comment: it is used after the bootstrapping and for other purposes than file names
This commit is contained in:
parent
766ad36de5
commit
77c3862417
2 changed files with 101 additions and 47 deletions
|
@ -10,11 +10,12 @@ Unicode Objects and Codecs
|
||||||
Unicode Objects
|
Unicode Objects
|
||||||
^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
Unicode Type
|
||||||
|
""""""""""""
|
||||||
|
|
||||||
These are the basic Unicode object types used for the Unicode implementation in
|
These are the basic Unicode object types used for the Unicode implementation in
|
||||||
Python:
|
Python:
|
||||||
|
|
||||||
.. % --- Unicode Type -------------------------------------------------------
|
|
||||||
|
|
||||||
|
|
||||||
.. ctype:: Py_UNICODE
|
.. ctype:: Py_UNICODE
|
||||||
|
|
||||||
|
@ -89,12 +90,13 @@ access internal read-only data of Unicode objects:
|
||||||
Clear the free list. Return the total number of freed items.
|
Clear the free list. Return the total number of freed items.
|
||||||
|
|
||||||
|
|
||||||
|
Unicode Character Properties
|
||||||
|
""""""""""""""""""""""""""""
|
||||||
|
|
||||||
Unicode provides many different character properties. The most often needed ones
|
Unicode provides many different character properties. The most often needed ones
|
||||||
are available through these macros which are mapped to C functions depending on
|
are available through these macros which are mapped to C functions depending on
|
||||||
the Python configuration.
|
the Python configuration.
|
||||||
|
|
||||||
.. % --- Unicode character properties ---------------------------------------
|
|
||||||
|
|
||||||
|
|
||||||
.. cfunction:: int Py_UNICODE_ISSPACE(Py_UNICODE ch)
|
.. cfunction:: int Py_UNICODE_ISSPACE(Py_UNICODE ch)
|
||||||
|
|
||||||
|
@ -192,11 +194,13 @@ These APIs can be used for fast direct character conversions:
|
||||||
Return the character *ch* converted to a double. Return ``-1.0`` if this is not
|
Return the character *ch* converted to a double. Return ``-1.0`` if this is not
|
||||||
possible. This macro does not raise exceptions.
|
possible. This macro does not raise exceptions.
|
||||||
|
|
||||||
|
|
||||||
|
Plain Py_UNICODE
|
||||||
|
""""""""""""""""
|
||||||
|
|
||||||
To create Unicode objects and access their basic sequence properties, use these
|
To create Unicode objects and access their basic sequence properties, use these
|
||||||
APIs:
|
APIs:
|
||||||
|
|
||||||
.. % --- Plain Py_UNICODE ---------------------------------------------------
|
|
||||||
|
|
||||||
|
|
||||||
.. cfunction:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
|
.. cfunction:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
|
||||||
|
|
||||||
|
@ -364,8 +368,46 @@ Python can interface directly to this type using the following functions.
|
||||||
Support is optimized if Python's own :ctype:`Py_UNICODE` type is identical to
|
Support is optimized if Python's own :ctype:`Py_UNICODE` type is identical to
|
||||||
the system's :ctype:`wchar_t`.
|
the system's :ctype:`wchar_t`.
|
||||||
|
|
||||||
.. % --- wchar_t support for platforms which support it ---------------------
|
|
||||||
|
|
||||||
|
File System Encoding
|
||||||
|
""""""""""""""""""""
|
||||||
|
|
||||||
|
To encode and decode file names and other environment strings,
|
||||||
|
:cdata:`Py_FileSystemEncoding` should be used as the encoding, and
|
||||||
|
``"surrogateescape"`` should be used as the error handler (:pep:`383`). To
|
||||||
|
encode file names during argument parsing, the ``"O&"`` converter should be
|
||||||
|
used, passsing :func:PyUnicode_FSConverter as the conversion function:
|
||||||
|
|
||||||
|
.. cfunction:: int PyUnicode_FSConverter(PyObject* obj, void* result)
|
||||||
|
|
||||||
|
Convert *obj* into *result*, using :cdata:`Py_FileSystemDefaultEncoding`,
|
||||||
|
and the ``"surrogateescape"`` error handler. *result* must be a
|
||||||
|
``PyObject*``, return a :func:`bytes` object which must be released if it
|
||||||
|
is no longer used.
|
||||||
|
|
||||||
|
.. versionadded:: 3.1
|
||||||
|
|
||||||
|
.. cfunction:: PyObject* PyUnicode_DecodeFSDefaultAndSize(const char *s, Py_ssize_t size)
|
||||||
|
|
||||||
|
Decode a null-terminated string using :cdata:`Py_FileSystemDefaultEncoding`
|
||||||
|
and the ``"surrogateescape"`` error handler.
|
||||||
|
|
||||||
|
If :cdata:`Py_FileSystemDefaultEncoding` is not set, fall back to UTF-8.
|
||||||
|
|
||||||
|
Use :func:`PyUnicode_DecodeFSDefaultAndSize` if you know the string length.
|
||||||
|
|
||||||
|
.. cfunction:: PyObject* PyUnicode_DecodeFSDefault(const char *s)
|
||||||
|
|
||||||
|
Decode a string using :cdata:`Py_FileSystemDefaultEncoding` and
|
||||||
|
the ``"surrogateescape"`` error handler.
|
||||||
|
|
||||||
|
If :cdata:`Py_FileSystemDefaultEncoding` is not set, fall back to UTF-8.
|
||||||
|
|
||||||
|
|
||||||
|
wchar_t Support
|
||||||
|
"""""""""""""""
|
||||||
|
|
||||||
|
wchar_t support for platforms which support it:
|
||||||
|
|
||||||
.. cfunction:: PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
|
.. cfunction:: PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
|
||||||
|
|
||||||
|
@ -413,9 +455,11 @@ built-in codecs is "strict" (:exc:`ValueError` is raised).
|
||||||
The codecs all use a similar interface. Only deviation from the following
|
The codecs all use a similar interface. Only deviation from the following
|
||||||
generic ones are documented for simplicity.
|
generic ones are documented for simplicity.
|
||||||
|
|
||||||
These are the generic codec APIs:
|
|
||||||
|
|
||||||
.. % --- Generic Codecs -----------------------------------------------------
|
Generic Codecs
|
||||||
|
""""""""""""""
|
||||||
|
|
||||||
|
These are the generic codec APIs:
|
||||||
|
|
||||||
|
|
||||||
.. cfunction:: PyObject* PyUnicode_Decode(const char *s, Py_ssize_t size, const char *encoding, const char *errors)
|
.. cfunction:: PyObject* PyUnicode_Decode(const char *s, Py_ssize_t size, const char *encoding, const char *errors)
|
||||||
|
@ -444,9 +488,11 @@ These are the generic codec APIs:
|
||||||
using the Python codec registry. Return *NULL* if an exception was raised by
|
using the Python codec registry. Return *NULL* if an exception was raised by
|
||||||
the codec.
|
the codec.
|
||||||
|
|
||||||
These are the UTF-8 codec APIs:
|
|
||||||
|
|
||||||
.. % --- UTF-8 Codecs -------------------------------------------------------
|
UTF-8 Codecs
|
||||||
|
""""""""""""
|
||||||
|
|
||||||
|
These are the UTF-8 codec APIs:
|
||||||
|
|
||||||
|
|
||||||
.. cfunction:: PyObject* PyUnicode_DecodeUTF8(const char *s, Py_ssize_t size, const char *errors)
|
.. cfunction:: PyObject* PyUnicode_DecodeUTF8(const char *s, Py_ssize_t size, const char *errors)
|
||||||
|
@ -476,9 +522,11 @@ These are the UTF-8 codec APIs:
|
||||||
object. Error handling is "strict". Return *NULL* if an exception was
|
object. Error handling is "strict". Return *NULL* if an exception was
|
||||||
raised by the codec.
|
raised by the codec.
|
||||||
|
|
||||||
These are the UTF-32 codec APIs:
|
|
||||||
|
|
||||||
.. % --- UTF-32 Codecs ------------------------------------------------------ */
|
UTF-32 Codecs
|
||||||
|
"""""""""""""
|
||||||
|
|
||||||
|
These are the UTF-32 codec APIs:
|
||||||
|
|
||||||
|
|
||||||
.. cfunction:: PyObject* PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, const char *errors, int *byteorder)
|
.. cfunction:: PyObject* PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, const char *errors, int *byteorder)
|
||||||
|
@ -543,9 +591,10 @@ These are the UTF-32 codec APIs:
|
||||||
Return *NULL* if an exception was raised by the codec.
|
Return *NULL* if an exception was raised by the codec.
|
||||||
|
|
||||||
|
|
||||||
These are the UTF-16 codec APIs:
|
UTF-16 Codecs
|
||||||
|
"""""""""""""
|
||||||
|
|
||||||
.. % --- UTF-16 Codecs ------------------------------------------------------ */
|
These are the UTF-16 codec APIs:
|
||||||
|
|
||||||
|
|
||||||
.. cfunction:: PyObject* PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, const char *errors, int *byteorder)
|
.. cfunction:: PyObject* PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, const char *errors, int *byteorder)
|
||||||
|
@ -609,9 +658,11 @@ These are the UTF-16 codec APIs:
|
||||||
order. The string always starts with a BOM mark. Error handling is "strict".
|
order. The string always starts with a BOM mark. Error handling is "strict".
|
||||||
Return *NULL* if an exception was raised by the codec.
|
Return *NULL* if an exception was raised by the codec.
|
||||||
|
|
||||||
These are the "Unicode Escape" codec APIs:
|
|
||||||
|
|
||||||
.. % --- Unicode-Escape Codecs ----------------------------------------------
|
Unicode-Escape Codecs
|
||||||
|
"""""""""""""""""""""
|
||||||
|
|
||||||
|
These are the "Unicode Escape" codec APIs:
|
||||||
|
|
||||||
|
|
||||||
.. cfunction:: PyObject* PyUnicode_DecodeUnicodeEscape(const char *s, Py_ssize_t size, const char *errors)
|
.. cfunction:: PyObject* PyUnicode_DecodeUnicodeEscape(const char *s, Py_ssize_t size, const char *errors)
|
||||||
|
@ -633,9 +684,11 @@ These are the "Unicode Escape" codec APIs:
|
||||||
string object. Error handling is "strict". Return *NULL* if an exception was
|
string object. Error handling is "strict". Return *NULL* if an exception was
|
||||||
raised by the codec.
|
raised by the codec.
|
||||||
|
|
||||||
These are the "Raw Unicode Escape" codec APIs:
|
|
||||||
|
|
||||||
.. % --- Raw-Unicode-Escape Codecs ------------------------------------------
|
Raw-Unicode-Escape Codecs
|
||||||
|
"""""""""""""""""""""""""
|
||||||
|
|
||||||
|
These are the "Raw Unicode Escape" codec APIs:
|
||||||
|
|
||||||
|
|
||||||
.. cfunction:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char *s, Py_ssize_t size, const char *errors)
|
.. cfunction:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char *s, Py_ssize_t size, const char *errors)
|
||||||
|
@ -657,11 +710,13 @@ These are the "Raw Unicode Escape" codec APIs:
|
||||||
Python string object. Error handling is "strict". Return *NULL* if an exception
|
Python string object. Error handling is "strict". Return *NULL* if an exception
|
||||||
was raised by the codec.
|
was raised by the codec.
|
||||||
|
|
||||||
|
|
||||||
|
Latin-1 Codecs
|
||||||
|
""""""""""""""
|
||||||
|
|
||||||
These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode
|
These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode
|
||||||
ordinals and only these are accepted by the codecs during encoding.
|
ordinals and only these are accepted by the codecs during encoding.
|
||||||
|
|
||||||
.. % --- Latin-1 Codecs -----------------------------------------------------
|
|
||||||
|
|
||||||
|
|
||||||
.. cfunction:: PyObject* PyUnicode_DecodeLatin1(const char *s, Py_ssize_t size, const char *errors)
|
.. cfunction:: PyObject* PyUnicode_DecodeLatin1(const char *s, Py_ssize_t size, const char *errors)
|
||||||
|
|
||||||
|
@ -682,11 +737,13 @@ ordinals and only these are accepted by the codecs during encoding.
|
||||||
object. Error handling is "strict". Return *NULL* if an exception was
|
object. Error handling is "strict". Return *NULL* if an exception was
|
||||||
raised by the codec.
|
raised by the codec.
|
||||||
|
|
||||||
|
|
||||||
|
ASCII Codecs
|
||||||
|
""""""""""""
|
||||||
|
|
||||||
These are the ASCII codec APIs. Only 7-bit ASCII data is accepted. All other
|
These are the ASCII codec APIs. Only 7-bit ASCII data is accepted. All other
|
||||||
codes generate errors.
|
codes generate errors.
|
||||||
|
|
||||||
.. % --- ASCII Codecs -------------------------------------------------------
|
|
||||||
|
|
||||||
|
|
||||||
.. cfunction:: PyObject* PyUnicode_DecodeASCII(const char *s, Py_ssize_t size, const char *errors)
|
.. cfunction:: PyObject* PyUnicode_DecodeASCII(const char *s, Py_ssize_t size, const char *errors)
|
||||||
|
|
||||||
|
@ -707,9 +764,11 @@ codes generate errors.
|
||||||
object. Error handling is "strict". Return *NULL* if an exception was
|
object. Error handling is "strict". Return *NULL* if an exception was
|
||||||
raised by the codec.
|
raised by the codec.
|
||||||
|
|
||||||
These are the mapping codec APIs:
|
|
||||||
|
|
||||||
.. % --- Character Map Codecs -----------------------------------------------
|
Character Map Codecs
|
||||||
|
""""""""""""""""""""
|
||||||
|
|
||||||
|
These are the mapping codec APIs:
|
||||||
|
|
||||||
This codec is special in that it can be used to implement many different codecs
|
This codec is special in that it can be used to implement many different codecs
|
||||||
(and this is in fact what was done to obtain most of the standard codecs
|
(and this is in fact what was done to obtain most of the standard codecs
|
||||||
|
@ -778,7 +837,9 @@ use the Win32 MBCS converters to implement the conversions. Note that MBCS (or
|
||||||
DBCS) is a class of encodings, not just one. The target encoding is defined by
|
DBCS) is a class of encodings, not just one. The target encoding is defined by
|
||||||
the user settings on the machine running the codec.
|
the user settings on the machine running the codec.
|
||||||
|
|
||||||
.. % --- MBCS codecs for Windows --------------------------------------------
|
|
||||||
|
MBCS codecs for Windows
|
||||||
|
"""""""""""""""""""""""
|
||||||
|
|
||||||
|
|
||||||
.. cfunction:: PyObject* PyUnicode_DecodeMBCS(const char *s, Py_ssize_t size, const char *errors)
|
.. cfunction:: PyObject* PyUnicode_DecodeMBCS(const char *s, Py_ssize_t size, const char *errors)
|
||||||
|
@ -808,20 +869,9 @@ the user settings on the machine running the codec.
|
||||||
object. Error handling is "strict". Return *NULL* if an exception was
|
object. Error handling is "strict". Return *NULL* if an exception was
|
||||||
raised by the codec.
|
raised by the codec.
|
||||||
|
|
||||||
For decoding file names and other environment strings, :cdata:`Py_FileSystemEncoding`
|
|
||||||
should be used as the encoding, and ``"surrogateescape"`` should be used as the error
|
|
||||||
handler. For encoding file names during argument parsing, the ``O&`` converter should
|
|
||||||
be used, passsing PyUnicode_FSConverter as the conversion function:
|
|
||||||
|
|
||||||
.. cfunction:: int PyUnicode_FSConverter(PyObject* obj, void* result)
|
Methods & Slots
|
||||||
|
"""""""""""""""
|
||||||
Convert *obj* into *result*, using the file system encoding, and the ``surrogateescape``
|
|
||||||
error handler. *result* must be a ``PyObject*``, yielding a bytes or bytearray object
|
|
||||||
which must be released if it is no longer used.
|
|
||||||
|
|
||||||
.. versionadded:: 3.1
|
|
||||||
|
|
||||||
.. % --- Methods & Slots ----------------------------------------------------
|
|
||||||
|
|
||||||
|
|
||||||
.. _unicodemethodsandslots:
|
.. _unicodemethodsandslots:
|
||||||
|
|
|
@ -1240,25 +1240,29 @@ PyAPI_FUNC(int) PyUnicode_EncodeDecimal(
|
||||||
/* --- File system encoding ---------------------------------------------- */
|
/* --- File system encoding ---------------------------------------------- */
|
||||||
|
|
||||||
/* ParseTuple converter which converts a Unicode object into the file
|
/* ParseTuple converter which converts a Unicode object into the file
|
||||||
system encoding as a bytes object, using the PEP 383 error handler; bytes
|
system encoding as a bytes object, using the "surrogateescape" error
|
||||||
objects are output as-is. */
|
handler; bytes objects are output as-is. */
|
||||||
|
|
||||||
PyAPI_FUNC(int) PyUnicode_FSConverter(PyObject*, void*);
|
PyAPI_FUNC(int) PyUnicode_FSConverter(PyObject*, void*);
|
||||||
|
|
||||||
/* Decode a null-terminated string using Py_FileSystemDefaultEncoding.
|
/* Decode a null-terminated string using Py_FileSystemDefaultEncoding
|
||||||
|
and the "surrogateescape" error handler.
|
||||||
|
|
||||||
If the encoding is supported by one of the built-in codecs (i.e., UTF-8,
|
If Py_FileSystemDefaultEncoding is not set, fall back to UTF-8.
|
||||||
UTF-16, UTF-32, Latin-1 or MBCS), otherwise fallback to UTF-8 and replace
|
|
||||||
invalid characters with '?'.
|
|
||||||
|
|
||||||
The function is intended to be used for paths and file names only
|
Use PyUnicode_DecodeFSDefaultAndSize() if you have the string length.
|
||||||
during bootstrapping process where the codecs are not set up.
|
|
||||||
*/
|
*/
|
||||||
|
|
||||||
PyAPI_FUNC(PyObject*) PyUnicode_DecodeFSDefault(
|
PyAPI_FUNC(PyObject*) PyUnicode_DecodeFSDefault(
|
||||||
const char *s /* encoded string */
|
const char *s /* encoded string */
|
||||||
);
|
);
|
||||||
|
|
||||||
|
/* Decode a string using Py_FileSystemDefaultEncoding
|
||||||
|
and the "surrogateescape" error handler.
|
||||||
|
|
||||||
|
If Py_FileSystemDefaultEncoding is not set, fall back to UTF-8.
|
||||||
|
*/
|
||||||
|
|
||||||
PyAPI_FUNC(PyObject*) PyUnicode_DecodeFSDefaultAndSize(
|
PyAPI_FUNC(PyObject*) PyUnicode_DecodeFSDefaultAndSize(
|
||||||
const char *s, /* encoded string */
|
const char *s, /* encoded string */
|
||||||
Py_ssize_t size /* size */
|
Py_ssize_t size /* size */
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue