mirror of
https://github.com/python/cpython.git
synced 2025-11-03 11:23:31 +00:00
bpo-42236: Use UTF-8 encoding if nl_langinfo(CODESET) fails (GH-23086)
If the nl_langinfo(CODESET) function returns an empty string, Python now uses UTF-8 as the filesystem encoding. In May 2010 (commitb744ba1d14), I modified Python to log a warning and use UTF-8 as the filesystem encoding (instead of None) if nl_langinfo(CODESET) returns an empty string. In August 2020 (commit94908bbc15), I modified Python startup to fail with a fatal error and a specific error message if nl_langinfo(CODESET) returns an empty string. The intent was to prevent guessing the encoding and also investigate user configuration where this case happens. In 10 years (2010 to 2020), I saw zero user report about the error message related to nl_langinfo(CODESET) returning an empty string. Today, UTF-8 became the defacto standard and it's safe to make the assumption that the user expects UTF-8. For example, nl_langinfo(CODESET) can return an empty string on macOS if the LC_CTYPE locale is not supported, and UTF-8 is the default encoding on macOS. While this change is likely to not affect anyone in practice, it should make UTF-8 lover happy ;-) Rewrite also the documentation explaining how Python selects the filesystem encoding and error handler.
This commit is contained in:
parent
82458b6cdb
commit
e662c398d8
8 changed files with 87 additions and 89 deletions
|
|
@ -156,36 +156,13 @@ typedef struct {
|
|||
/* Python filesystem encoding and error handler:
|
||||
sys.getfilesystemencoding() and sys.getfilesystemencodeerrors().
|
||||
|
||||
Default encoding and error handler:
|
||||
The Doc/c-api/init_config.rst documentation explains how Python selects
|
||||
the filesystem encoding and error handler.
|
||||
|
||||
* if Py_SetStandardStreamEncoding() has been called: they have the
|
||||
highest priority;
|
||||
* PYTHONIOENCODING environment variable;
|
||||
* The UTF-8 Mode uses UTF-8/surrogateescape;
|
||||
* If Python forces the usage of the ASCII encoding (ex: C locale
|
||||
or POSIX locale on FreeBSD or HP-UX), use ASCII/surrogateescape;
|
||||
* locale encoding: ANSI code page on Windows, UTF-8 on Android and
|
||||
VxWorks, LC_CTYPE locale encoding on other platforms;
|
||||
* On Windows, "surrogateescape" error handler;
|
||||
* "surrogateescape" error handler if the LC_CTYPE locale is "C" or "POSIX";
|
||||
* "surrogateescape" error handler if the LC_CTYPE locale has been coerced
|
||||
(PEP 538);
|
||||
* "strict" error handler.
|
||||
|
||||
Supported error handlers: "strict", "surrogateescape" and
|
||||
"surrogatepass". The surrogatepass error handler is only supported
|
||||
if Py_DecodeLocale() and Py_EncodeLocale() use directly the UTF-8 codec;
|
||||
it's only used on Windows.
|
||||
|
||||
initfsencoding() updates the encoding to the Python codec name.
|
||||
For example, "ANSI_X3.4-1968" is replaced with "ascii".
|
||||
|
||||
On Windows, sys._enablelegacywindowsfsencoding() sets the
|
||||
encoding/errors to mbcs/replace at runtime.
|
||||
|
||||
|
||||
See Py_FileSystemDefaultEncoding and Py_FileSystemDefaultEncodeErrors.
|
||||
*/
|
||||
_PyUnicode_InitEncodings() updates the encoding name to the Python codec
|
||||
name. For example, "ANSI_X3.4-1968" is replaced with "ascii". It also
|
||||
sets Py_FileSystemDefaultEncoding to filesystem_encoding and
|
||||
sets Py_FileSystemDefaultEncodeErrors to filesystem_errors. */
|
||||
wchar_t *filesystem_encoding;
|
||||
wchar_t *filesystem_errors;
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue