bpo-42236: Use UTF-8 encoding if nl_langinfo(CODESET) fails (GH-23086)

If the nl_langinfo(CODESET) function returns an empty string, Python
now uses UTF-8 as the filesystem encoding.

In May 2010 (commit b744ba1d14), I
modified Python to log a warning and use UTF-8 as the filesystem
encoding (instead of None) if nl_langinfo(CODESET) returns an empty
string.

In August 2020 (commit 94908bbc15), I
modified Python startup to fail with a fatal error and a specific
error message if nl_langinfo(CODESET) returns an empty string. The
intent was to prevent guessing the encoding and also investigate user
configuration where this case happens.

In 10 years (2010 to 2020), I saw zero user report about the error
message related to nl_langinfo(CODESET) returning an empty string.

Today, UTF-8 became the defacto standard and it's safe to make the
assumption that the user expects UTF-8. For example,
nl_langinfo(CODESET) can return an empty string on macOS if the
LC_CTYPE locale is not supported, and UTF-8 is the default encoding
on macOS.

While this change is likely to not affect anyone in practice, it
should make UTF-8 lover happy ;-)

Rewrite also the documentation explaining how Python selects the
filesystem encoding and error handler.
This commit is contained in:
Victor Stinner 2020-11-01 23:07:23 +01:00 committed by GitHub
parent 82458b6cdb
commit e662c398d8
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
8 changed files with 87 additions and 89 deletions

View file

@ -1318,7 +1318,7 @@ config_read_env_vars(PyConfig *config)
#ifdef MS_WINDOWS
_Py_get_env_flag(use_env, &config->legacy_windows_stdio,
"PYTHONLEGACYWINDOWSSTDIO");
"PYTHONLEGACYWINDOWSSTDIO");
#endif
if (config_get_env(config, "PYTHONDUMPREFS")) {
@ -1498,15 +1498,9 @@ static PyStatus
config_get_locale_encoding(PyConfig *config, const PyPreConfig *preconfig,
wchar_t **locale_encoding)
{
const char *errmsg;
wchar_t *encoding = _Py_GetLocaleEncoding(&errmsg);
wchar_t *encoding = _Py_GetLocaleEncoding();
if (encoding == NULL) {
if (errmsg != NULL) {
return _PyStatus_ERR(errmsg);
}
else {
return _PyStatus_NO_MEMORY();
}
return _PyStatus_NO_MEMORY();
}
PyStatus status = PyConfig_SetString(config, locale_encoding, encoding);
PyMem_RawFree(encoding);