closes bpo-34056: Always return bytes from _HackedGetData.get_data(). (GH-8130)

* Always return bytes from _HackedGetData.get_data().

Ensure the imp.load_source shim always returns bytes by reopening the file in
binary mode if needed. Hash-based pycs have to receive the source code in bytes.

It's tempting to change imp.get_suffixes() to always return 'rb' as a mode, but
that breaks some stdlib tests and likely 3rdparty code, too.
This commit is contained in:
Benjamin Peterson 2018-07-06 20:41:06 -07:00 committed by GitHub
parent e25399b40c
commit b0274f2cdd
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
3 changed files with 24 additions and 7 deletions

View file

@ -142,17 +142,16 @@ class _HackedGetData:
def get_data(self, path):
"""Gross hack to contort loader to deal w/ load_*()'s bad API."""
if self.file and path == self.path:
# The contract of get_data() requires us to return bytes. Reopen the
# file in binary mode if needed.
if not self.file.closed:
file = self.file
else:
self.file = file = open(self.path, 'r')
if 'b' not in file.mode:
file.close()
if self.file.closed:
self.file = file = open(self.path, 'rb')
with file:
# Technically should be returning bytes, but
# SourceLoader.get_code() just passed what is returned to
# compile() which can handle str. And converting to bytes would
# require figuring out the encoding to decode to and
# tokenize.detect_encoding() only accepts bytes.
return file.read()
else:
return super().get_data(path)