mirror of
https://github.com/python/cpython.git
synced 2025-08-04 17:08:35 +00:00
bpo-42236: Use UTF-8 encoding if nl_langinfo(CODESET) fails (GH-23086)
If the nl_langinfo(CODESET) function returns an empty string, Python now uses UTF-8 as the filesystem encoding. In May 2010 (commitb744ba1d14
), I modified Python to log a warning and use UTF-8 as the filesystem encoding (instead of None) if nl_langinfo(CODESET) returns an empty string. In August 2020 (commit94908bbc15
), I modified Python startup to fail with a fatal error and a specific error message if nl_langinfo(CODESET) returns an empty string. The intent was to prevent guessing the encoding and also investigate user configuration where this case happens. In 10 years (2010 to 2020), I saw zero user report about the error message related to nl_langinfo(CODESET) returning an empty string. Today, UTF-8 became the defacto standard and it's safe to make the assumption that the user expects UTF-8. For example, nl_langinfo(CODESET) can return an empty string on macOS if the LC_CTYPE locale is not supported, and UTF-8 is the default encoding on macOS. While this change is likely to not affect anyone in practice, it should make UTF-8 lover happy ;-) Rewrite also the documentation explaining how Python selects the filesystem encoding and error handler.
This commit is contained in:
parent
82458b6cdb
commit
e662c398d8
8 changed files with 87 additions and 89 deletions
|
@ -253,10 +253,16 @@ PyPreConfig
|
|||
|
||||
See :c:member:`PyConfig.isolated`.
|
||||
|
||||
.. c:member:: int legacy_windows_fs_encoding (Windows only)
|
||||
.. c:member:: int legacy_windows_fs_encoding
|
||||
|
||||
If non-zero, disable UTF-8 Mode, set the Python filesystem encoding to
|
||||
``mbcs``, set the filesystem error handler to ``replace``.
|
||||
If non-zero:
|
||||
|
||||
* Set :c:member:`PyPreConfig.utf8_mode` to ``0``,
|
||||
* Set :c:member:`PyConfig.filesystem_encoding` to ``"mbcs"``,
|
||||
* Set :c:member:`PyConfig.filesystem_errors` to ``"replace"``.
|
||||
|
||||
Initialized the from :envvar:`PYTHONLEGACYWINDOWSFSENCODING` environment
|
||||
variable value.
|
||||
|
||||
Only available on Windows. ``#ifdef MS_WINDOWS`` macro can be used for
|
||||
Windows specific code.
|
||||
|
@ -499,11 +505,47 @@ PyConfig
|
|||
|
||||
.. c:member:: wchar_t* filesystem_encoding
|
||||
|
||||
Filesystem encoding, :func:`sys.getfilesystemencoding`.
|
||||
Filesystem encoding: :func:`sys.getfilesystemencoding`.
|
||||
|
||||
On macOS, Android and VxWorks: use ``"utf-8"`` by default.
|
||||
|
||||
On Windows: use ``"utf-8"`` by default, or ``"mbcs"`` if
|
||||
:c:member:`~PyPreConfig.legacy_windows_fs_encoding` of
|
||||
:c:type:`PyPreConfig` is non-zero.
|
||||
|
||||
Default encoding on other platforms:
|
||||
|
||||
* ``"utf-8"`` if :c:member:`PyPreConfig.utf8_mode` is non-zero.
|
||||
* ``"ascii"`` if Python detects that ``nl_langinfo(CODESET)`` announces
|
||||
the ASCII encoding (or Roman8 encoding on HP-UX), whereas the
|
||||
``mbstowcs()`` function decodes from a different encoding (usually
|
||||
Latin1).
|
||||
* ``"utf-8"`` if ``nl_langinfo(CODESET)`` returns an empty string.
|
||||
* Otherwise, use the LC_CTYPE locale encoding:
|
||||
``nl_langinfo(CODESET)`` result.
|
||||
|
||||
At Python statup, the encoding name is normalized to the Python codec
|
||||
name. For example, ``"ANSI_X3.4-1968"`` is replaced with ``"ascii"``.
|
||||
|
||||
See also the :c:member:`~PyConfig.filesystem_errors` member.
|
||||
|
||||
.. c:member:: wchar_t* filesystem_errors
|
||||
|
||||
Filesystem encoding errors, :func:`sys.getfilesystemencodeerrors`.
|
||||
Filesystem error handler: :func:`sys.getfilesystemencodeerrors`.
|
||||
|
||||
On Windows: use ``"surrogatepass"`` by default, or ``"replace"`` if
|
||||
:c:member:`~PyPreConfig.legacy_windows_fs_encoding` of
|
||||
:c:type:`PyPreConfig` is non-zero.
|
||||
|
||||
On other platforms: use ``"surrogateescape"`` by default.
|
||||
|
||||
Supported error handlers:
|
||||
|
||||
* ``"strict"``
|
||||
* ``"surrogateescape"``
|
||||
* ``"surrogatepass"`` (only supported with the UTF-8 encoding)
|
||||
|
||||
See also the :c:member:`~PyConfig.filesystem_encoding` member.
|
||||
|
||||
.. c:member:: unsigned long hash_seed
|
||||
.. c:member:: int use_hash_seed
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue