bpo-42236: Enhance init and encoding documentation (GH-23109)

Enhance the documentation of the Python startup, filesystem encoding
and error handling, locale encoding. Add a new "Python UTF-8 Mode"
section.

* Add "locale encoding" and "filesystem encoding and error handler"
  to the glossary
* Remove documentation from Include/cpython/initconfig.h: move it to
  Doc/c-api/init_config.rst.
* Doc/c-api/init_config.rst:

  * Document command line options and environment variables
  * Document default values.

* Add a new "Python UTF-8 Mode" section in Doc/library/os.rst.
* Add warnings to Py_DecodeLocale() and Py_EncodeLocale() docs.
* Document how Python selects the filesystem encoding and error
  handler at a single place: PyConfig.filesystem_encoding and
  PyConfig.filesystem_errors.
* PyConfig: move orig_argv member at the right place.
This commit is contained in:
Victor Stinner 2020-11-02 16:49:54 +01:00 committed by GitHub
parent 301822859b
commit 4b9aad4999
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
19 changed files with 735 additions and 520 deletions

View file

@ -447,10 +447,9 @@ Miscellaneous options
* ``-X dev``: enable :ref:`Python Development Mode <devmode>`, introducing
additional runtime checks that are too expensive to be enabled by
default.
* ``-X utf8`` enables UTF-8 mode for operating system interfaces, overriding
the default locale-aware mode. ``-X utf8=0`` explicitly disables UTF-8
mode (even when it would otherwise activate automatically).
See :envvar:`PYTHONUTF8` for more details.
* ``-X utf8`` enables the :ref:`Python UTF-8 Mode <utf8-mode>`.
``-X utf8=0`` explicitly disables :ref:`Python UTF-8 Mode <utf8-mode>`
(even when it would otherwise activate automatically).
* ``-X pycache_prefix=PATH`` enables writing ``.pyc`` files to a parallel
tree rooted at the given directory instead of to the code tree. See also
:envvar:`PYTHONPYCACHEPREFIX`.
@ -810,9 +809,10 @@ conflict.
.. envvar:: PYTHONLEGACYWINDOWSFSENCODING
If set to a non-empty string, the default filesystem encoding and errors mode
will revert to their pre-3.6 values of 'mbcs' and 'replace', respectively.
Otherwise, the new defaults 'utf-8' and 'surrogatepass' are used.
If set to a non-empty string, the default :term:`filesystem encoding and
error handler` mode will revert to their pre-3.6 values of 'mbcs' and
'replace', respectively. Otherwise, the new defaults 'utf-8' and
'surrogatepass' are used.
This may also be enabled at runtime with
:func:`sys._enablelegacywindowsfsencoding()`.
@ -898,54 +898,14 @@ conflict.
.. envvar:: PYTHONUTF8
If set to ``1``, enables the interpreter's UTF-8 mode, where ``UTF-8`` is
used as the text encoding for system interfaces, regardless of the
current locale setting.
If set to ``1``, enable the :ref:`Python UTF-8 Mode <utf8-mode>`.
This means that:
* :func:`sys.getfilesystemencoding()` returns ``'UTF-8'`` (the locale
encoding is ignored).
* :func:`locale.getpreferredencoding()` returns ``'UTF-8'`` (the locale
encoding is ignored, and the function's ``do_setlocale`` parameter has no
effect).
* :data:`sys.stdin`, :data:`sys.stdout`, and :data:`sys.stderr` all use
UTF-8 as their text encoding, with the ``surrogateescape``
:ref:`error handler <error-handlers>` being enabled for :data:`sys.stdin`
and :data:`sys.stdout` (:data:`sys.stderr` continues to use
``backslashreplace`` as it does in the default locale-aware mode)
As a consequence of the changes in those lower level APIs, other higher
level APIs also exhibit different default behaviours:
* Command line arguments, environment variables and filenames are decoded
to text using the UTF-8 encoding.
* :func:`os.fsdecode()` and :func:`os.fsencode()` use the UTF-8 encoding.
* :func:`open()`, :func:`io.open()`, and :func:`codecs.open()` use the UTF-8
encoding by default. However, they still use the strict error handler by
default so that attempting to open a binary file in text mode is likely
to raise an exception rather than producing nonsense data.
Note that the standard stream settings in UTF-8 mode can be overridden by
:envvar:`PYTHONIOENCODING` (just as they can be in the default locale-aware
mode).
If set to ``0``, the interpreter runs in its default locale-aware mode.
If set to ``0``, disable the :ref:`Python UTF-8 Mode <utf8-mode>`.
Setting any other non-empty string causes an error during interpreter
initialisation.
If this environment variable is not set at all, then the interpreter defaults
to using the current locale settings, *unless* the current locale is
identified as a legacy ASCII-based locale
(as described for :envvar:`PYTHONCOERCECLOCALE`), and locale coercion is
either disabled or fails. In such legacy locales, the interpreter will
default to enabling UTF-8 mode unless explicitly instructed not to do so.
Also available as the :option:`-X` ``utf8`` option.
.. versionadded:: 3.7
See :pep:`540` for more details.
Debug-mode variables

View file

@ -614,21 +614,14 @@ Page). Python uses it for the default encoding of text files (e.g.
This may cause issues because UTF-8 is widely used on the internet
and most Unix systems, including WSL (Windows Subsystem for Linux).
You can use UTF-8 mode to change the default text encoding to UTF-8.
You can enable UTF-8 mode via the ``-X utf8`` command line option, or
the ``PYTHONUTF8=1`` environment variable. See :envvar:`PYTHONUTF8` for
enabling UTF-8 mode, and :ref:`setting-envvars` for how to modify
environment variables.
You can use the :ref:`Python UTF-8 Mode <utf8-mode>` to change the default text
encoding to UTF-8. You can enable the :ref:`Python UTF-8 Mode <utf8-mode>` via
the ``-X utf8`` command line option, or the ``PYTHONUTF8=1`` environment
variable. See :envvar:`PYTHONUTF8` for enabling UTF-8 mode, and
:ref:`setting-envvars` for how to modify environment variables.
When UTF-8 mode is enabled:
* :func:`locale.getpreferredencoding` returns ``'UTF-8'`` instead of
the system encoding. This function is used for the default text
encoding in many places, including :func:`open`, :class:`Popen`,
:meth:`Path.read_text`, etc.
* :data:`sys.stdin`, :data:`sys.stdout`, and :data:`sys.stderr`
all use UTF-8 as their text encoding.
* You can still use the system encoding via the "mbcs" codec.
When the :ref:`Python UTF-8 Mode <utf8-mode>` is enabled, you can still use the
system encoding (the ANSI Code Page) via the "mbcs" codec.
Note that adding ``PYTHONUTF8=1`` to the default environment variables
will affect all Python 3.7+ applications on your system.
@ -641,7 +634,8 @@ temporarily or use the ``-X utf8`` command line option.
on Windows for:
* Console I/O including standard I/O (see :pep:`528` for details).
* The filesystem encoding (see :pep:`529` for details).
* The :term:`filesystem encoding <filesystem encoding and error handler>`
(see :pep:`529` for details).
.. _launcher: