closes bpo-31650: PEP 552 (Deterministic pycs) implementation (#4575)

Python now supports checking bytecode cache up-to-dateness with a hash of the
source contents rather than volatile source metadata. See the PEP for details.

While a fairly straightforward idea, quite a lot of code had to be modified due
to the pervasiveness of pyc implementation details in the codebase. Changes in
this commit include:

- The core changes to importlib to understand how to read, validate, and
  regenerate hash-based pycs.

- Support for generating hash-based pycs in py_compile and compileall.

- Modifications to our siphash implementation to support passing a custom
  key. We then expose it to importlib through _imp.

- Updates to all places in the interpreter, standard library, and tests that
  manually generate or parse pyc files to grok the new format.

- Support in the interpreter command line code for long options like
  --check-hash-based-pycs.

- Tests and documentation for all of the above.
This commit is contained in:
Benjamin Peterson 2017-12-09 10:26:52 -08:00 committed by GitHub
parent 28d8d14013
commit 42aa93b8ff
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
33 changed files with 3364 additions and 2505 deletions

View file

@ -675,6 +675,33 @@ Here are the exact rules used:
:meth:`~importlib.abc.Loader.module_repr` method, if defined, before
trying either approach described above. However, the method is deprecated.
.. _pyc-invalidation:
Cached bytecode invalidation
----------------------------
Before Python loads cached bytecode from ``.pyc`` file, it checks whether the
cache is up-to-date with the source ``.py`` file. By default, Python does this
by storing the source's last-modified timestamp and size in the cache file when
writing it. At runtime, the import system then validates the cache file by
checking the stored metadata in the cache file against at source's
metadata.
Python also supports "hash-based" cache files, which store a hash of the source
file's contents rather than its metadata. There are two variants of hash-based
``.pyc`` files: checked and unchecked. For checked hash-based ``.pyc`` files,
Python validates the cache file by hashing the source file and comparing the
resulting hash with the hash in the cache file. If a checked hash-based cache
file is found to be invalid, Python regenerates it and writes a new checked
hash-based cache file. For unchecked hash-based ``.pyc`` files, Python simply
assumes the cache file is valid if it exists. Hash-based ``.pyc`` files
validation behavior may be overridden with the :option:`--check-hash-based-pycs`
flag.
.. versionchanged:: 3.7
Added hash-based ``.pyc`` files. Previously, Python only supported
timestamp-based invalidation of bytecode caches.
The Path Based Finder
=====================