bpo-36785: PEP 574 implementation (GH-7076)

This commit is contained in:
Antoine Pitrou 2019-05-26 17:10:09 +02:00 committed by GitHub
parent 22ccb0b490
commit 91f4380ced
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
19 changed files with 1888 additions and 242 deletions

View file

@ -195,34 +195,29 @@ The :mod:`pickle` module provides the following constants:
The :mod:`pickle` module provides the following functions to make the pickling
process more convenient:
.. function:: dump(obj, file, protocol=None, \*, fix_imports=True)
.. function:: dump(obj, file, protocol=None, \*, fix_imports=True, buffer_callback=None)
Write a pickled representation of *obj* to the open :term:`file object` *file*.
This is equivalent to ``Pickler(file, protocol).dump(obj)``.
The optional *protocol* argument, an integer, tells the pickler to use
the given protocol; supported protocols are 0 to :data:`HIGHEST_PROTOCOL`.
If not specified, the default is :data:`DEFAULT_PROTOCOL`. If a negative
number is specified, :data:`HIGHEST_PROTOCOL` is selected.
Arguments *file*, *protocol*, *fix_imports* and *buffer_callback* have
the same meaning as in the :class:`Pickler` constructor.
The *file* argument must have a write() method that accepts a single bytes
argument. It can thus be an on-disk file opened for binary writing, an
:class:`io.BytesIO` instance, or any other custom object that meets this
interface.
.. versionchanged:: 3.8
The *buffer_callback* argument was added.
If *fix_imports* is true and *protocol* is less than 3, pickle will try to
map the new Python 3 names to the old module names used in Python 2, so
that the pickle data stream is readable with Python 2.
.. function:: dumps(obj, protocol=None, \*, fix_imports=True)
.. function:: dumps(obj, protocol=None, \*, fix_imports=True, buffer_callback=None)
Return the pickled representation of the object as a :class:`bytes` object,
instead of writing it to a file.
Arguments *protocol* and *fix_imports* have the same meaning as in
:func:`dump`.
Arguments *protocol*, *fix_imports* and *buffer_callback* have the same
meaning as in the :class:`Pickler` constructor.
.. function:: load(file, \*, fix_imports=True, encoding="ASCII", errors="strict")
.. versionchanged:: 3.8
The *buffer_callback* argument was added.
.. function:: load(file, \*, fix_imports=True, encoding="ASCII", errors="strict", buffers=None)
Read a pickled object representation from the open :term:`file object`
*file* and return the reconstituted object hierarchy specified therein.
@ -232,24 +227,13 @@ process more convenient:
protocol argument is needed. Bytes past the pickled object's
representation are ignored.
The argument *file* must have two methods, a read() method that takes an
integer argument, and a readline() method that requires no arguments. Both
methods should return bytes. Thus *file* can be an on-disk file opened for
binary reading, an :class:`io.BytesIO` object, or any other custom object
that meets this interface.
Arguments *file*, *fix_imports*, *encoding*, *errors*, *strict* and *buffers*
have the same meaning as in the :class:`Unpickler` constructor.
Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
which are used to control compatibility support for pickle stream generated
by Python 2. If *fix_imports* is true, pickle will try to map the old
Python 2 names to the new names used in Python 3. The *encoding* and
*errors* tell pickle how to decode 8-bit string instances pickled by Python
2; these default to 'ASCII' and 'strict', respectively. The *encoding* can
be 'bytes' to read these 8-bit string instances as bytes objects.
Using ``encoding='latin1'`` is required for unpickling NumPy arrays and
instances of :class:`~datetime.datetime`, :class:`~datetime.date` and
:class:`~datetime.time` pickled by Python 2.
.. versionchanged:: 3.8
The *buffers* argument was added.
.. function:: loads(bytes_object, \*, fix_imports=True, encoding="ASCII", errors="strict")
.. function:: loads(bytes_object, \*, fix_imports=True, encoding="ASCII", errors="strict", buffers=None)
Read a pickled object hierarchy from a :class:`bytes` object and return the
reconstituted object hierarchy specified therein.
@ -258,16 +242,11 @@ process more convenient:
protocol argument is needed. Bytes past the pickled object's
representation are ignored.
Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
which are used to control compatibility support for pickle stream generated
by Python 2. If *fix_imports* is true, pickle will try to map the old
Python 2 names to the new names used in Python 3. The *encoding* and
*errors* tell pickle how to decode 8-bit string instances pickled by Python
2; these default to 'ASCII' and 'strict', respectively. The *encoding* can
be 'bytes' to read these 8-bit string instances as bytes objects.
Using ``encoding='latin1'`` is required for unpickling NumPy arrays and
instances of :class:`~datetime.datetime`, :class:`~datetime.date` and
:class:`~datetime.time` pickled by Python 2.
Arguments *file*, *fix_imports*, *encoding*, *errors*, *strict* and *buffers*
have the same meaning as in the :class:`Unpickler` constructor.
.. versionchanged:: 3.8
The *buffers* argument was added.
The :mod:`pickle` module defines three exceptions:
@ -295,10 +274,10 @@ The :mod:`pickle` module defines three exceptions:
IndexError.
The :mod:`pickle` module exports two classes, :class:`Pickler` and
:class:`Unpickler`:
The :mod:`pickle` module exports three classes, :class:`Pickler`,
:class:`Unpickler` and :class:`PickleBuffer`:
.. class:: Pickler(file, protocol=None, \*, fix_imports=True)
.. class:: Pickler(file, protocol=None, \*, fix_imports=True, buffer_callback=None)
This takes a binary file for writing a pickle data stream.
@ -316,6 +295,20 @@ The :mod:`pickle` module exports two classes, :class:`Pickler` and
map the new Python 3 names to the old module names used in Python 2, so
that the pickle data stream is readable with Python 2.
If *buffer_callback* is None (the default), buffer views are
serialized into *file* as part of the pickle stream.
If *buffer_callback* is not None, then it can be called any number
of times with a buffer view. If the callback returns a false value
(such as None), the given buffer is :ref:`out-of-band <pickle-oob>`;
otherwise the buffer is serialized in-band, i.e. inside the pickle stream.
It is an error if *buffer_callback* is not None and *protocol* is
None or smaller than 5.
.. versionchanged:: 3.8
The *buffer_callback* argument was added.
.. method:: dump(obj)
Write a pickled representation of *obj* to the open file object given in
@ -379,26 +372,43 @@ The :mod:`pickle` module exports two classes, :class:`Pickler` and
Use :func:`pickletools.optimize` if you need more compact pickles.
.. class:: Unpickler(file, \*, fix_imports=True, encoding="ASCII", errors="strict")
.. class:: Unpickler(file, \*, fix_imports=True, encoding="ASCII", errors="strict", buffers=None)
This takes a binary file for reading a pickle data stream.
The protocol version of the pickle is detected automatically, so no
protocol argument is needed.
The argument *file* must have two methods, a read() method that takes an
integer argument, and a readline() method that requires no arguments. Both
methods should return bytes. Thus *file* can be an on-disk file object
The argument *file* must have three methods, a read() method that takes an
integer argument, a readinto() method that takes a buffer argument
and a readline() method that requires no arguments, as in the
:class:`io.BufferedIOBase` interface. Thus *file* can be an on-disk file
opened for binary reading, an :class:`io.BytesIO` object, or any other
custom object that meets this interface.
Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
which are used to control compatibility support for pickle stream generated
by Python 2. If *fix_imports* is true, pickle will try to map the old
Python 2 names to the new names used in Python 3. The *encoding* and
*errors* tell pickle how to decode 8-bit string instances pickled by Python
2; these default to 'ASCII' and 'strict', respectively. The *encoding* can
The optional arguments *fix_imports*, *encoding* and *errors* are used
to control compatibility support for pickle stream generated by Python 2.
If *fix_imports* is true, pickle will try to map the old Python 2 names
to the new names used in Python 3. The *encoding* and *errors* tell
pickle how to decode 8-bit string instances pickled by Python 2;
these default to 'ASCII' and 'strict', respectively. The *encoding* can
be 'bytes' to read these 8-bit string instances as bytes objects.
Using ``encoding='latin1'`` is required for unpickling NumPy arrays and
instances of :class:`~datetime.datetime`, :class:`~datetime.date` and
:class:`~datetime.time` pickled by Python 2.
If *buffers* is None (the default), then all data necessary for
deserialization must be contained in the pickle stream. This means
that the *buffer_callback* argument was None when a :class:`Pickler`
was instantiated (or when :func:`dump` or :func:`dumps` was called).
If *buffers* is not None, it should be an iterable of buffer-enabled
objects that is consumed each time the pickle stream references
an :ref:`out-of-band <pickle-oob>` buffer view. Such buffers have been
given in order to the *buffer_callback* of a Pickler object.
.. versionchanged:: 3.8
The *buffers* argument was added.
.. method:: load()
@ -429,6 +439,34 @@ The :mod:`pickle` module exports two classes, :class:`Pickler` and
.. audit-event:: pickle.find_class "module name"
.. class:: PickleBuffer(buffer)
A wrapper for a buffer representing picklable data. *buffer* must be a
:ref:`buffer-providing <bufferobjects>` object, such as a
:term:`bytes-like object` or a N-dimensional array.
:class:`PickleBuffer` is itself a buffer provider, therefore it is
possible to pass it to other APIs expecting a buffer-providing object,
such as :class:`memoryview`.
:class:`PickleBuffer` objects can only be serialized using pickle
protocol 5 or higher. They are eligible for
:ref:`out-of-band serialization <pickle-oob>`.
.. versionadded:: 3.8
.. method:: raw()
Return a :class:`memoryview` of the memory area underlying this buffer.
The returned object is a one-dimensional, C-contiguous memoryview
with format ``B`` (unsigned bytes). :exc:`BufferError` is raised if
the buffer is neither C- nor Fortran-contiguous.
.. method:: release()
Release the underlying buffer exposed by the PickleBuffer object.
.. _pickle-picklable:
What can be pickled and unpickled?
@ -864,6 +902,125 @@ a given class::
assert unpickled_class.my_attribute == 1
.. _pickle-oob:
Out-of-band Buffers
-------------------
.. versionadded:: 3.8
In some contexts, the :mod:`pickle` module is used to transfer massive amounts
of data. Therefore, it can be important to minimize the number of memory
copies, to preserve performance and resource consumption. However, normal
operation of the :mod:`pickle` module, as it transforms a graph-like structure
of objects into a sequential stream of bytes, intrinsically involves copying
data to and from the pickle stream.
This constraint can be eschewed if both the *provider* (the implementation
of the object types to be transferred) and the *consumer* (the implementation
of the communications system) support the out-of-band transfer facilities
provided by pickle protocol 5 and higher.
Provider API
^^^^^^^^^^^^
The large data objects to be pickled must implement a :meth:`__reduce_ex__`
method specialized for protocol 5 and higher, which returns a
:class:`PickleBuffer` instance (instead of e.g. a :class:`bytes` object)
for any large data.
A :class:`PickleBuffer` object *signals* that the underlying buffer is
eligible for out-of-band data transfer. Those objects remain compatible
with normal usage of the :mod:`pickle` module. However, consumers can also
opt-in to tell :mod:`pickle` that they will handle those buffers by
themselves.
Consumer API
^^^^^^^^^^^^
A communications system can enable custom handling of the :class:`PickleBuffer`
objects generated when serializing an object graph.
On the sending side, it needs to pass a *buffer_callback* argument to
:class:`Pickler` (or to the :func:`dump` or :func:`dumps` function), which
will be called with each :class:`PickleBuffer` generated while pickling
the object graph. Buffers accumulated by the *buffer_callback* will not
see their data copied into the pickle stream, only a cheap marker will be
inserted.
On the receiving side, it needs to pass a *buffers* argument to
:class:`Unpickler` (or to the :func:`load` or :func:`loads` function),
which is an iterable of the buffers which were passed to *buffer_callback*.
That iterable should produce buffers in the same order as they were passed
to *buffer_callback*. Those buffers will provide the data expected by the
reconstructors of the objects whose pickling produced the original
:class:`PickleBuffer` objects.
Between the sending side and the receiving side, the communications system
is free to implement its own transfer mechanism for out-of-band buffers.
Potential optimizations include the use of shared memory or datatype-dependent
compression.
Example
^^^^^^^
Here is a trivial example where we implement a :class:`bytearray` subclass
able to participate in out-of-band buffer pickling::
class ZeroCopyByteArray(bytearray):
def __reduce_ex__(self, protocol):
if protocol >= 5:
return type(self)._reconstruct, (PickleBuffer(self),), None
else:
# PickleBuffer is forbidden with pickle protocols <= 4.
return type(self)._reconstruct, (bytearray(self),)
@classmethod
def _reconstruct(cls, obj):
with memoryview(obj) as m:
# Get a handle over the original buffer object
obj = m.obj
if type(obj) is cls:
# Original buffer object is a ZeroCopyByteArray, return it
# as-is.
return obj
else:
return cls(obj)
The reconstructor (the ``_reconstruct`` class method) returns the buffer's
providing object if it has the right type. This is an easy way to simulate
zero-copy behaviour on this toy example.
On the consumer side, we can pickle those objects the usual way, which
when unserialized will give us a copy of the original object::
b = ZeroCopyByteArray(b"abc")
data = pickle.dumps(b, protocol=5)
new_b = pickle.loads(data)
print(b == new_b) # True
print(b is new_b) # False: a copy was made
But if we pass a *buffer_callback* and then give back the accumulated
buffers when unserializing, we are able to get back the original object::
b = ZeroCopyByteArray(b"abc")
buffers = []
data = pickle.dumps(b, protocol=5, buffer_callback=buffers.append)
new_b = pickle.loads(data, buffers=buffers)
print(b == new_b) # True
print(b is new_b) # True: no copy was made
This example is limited by the fact that :class:`bytearray` allocates its
own memory: you cannot create a :class:`bytearray` instance that is backed
by another object's memory. However, third-party datatypes such as NumPy
arrays do not have this limitation, and allow use of zero-copy pickling
(or making as few copies as possible) when transferring between distinct
processes or systems.
.. seealso:: :pep:`574` -- Pickle protocol 5 with out-of-band data
.. _pickle-restrict:
Restricting Globals