bpo-31993: Do not allocate large temporary buffers in pickle dump. (#4353)

The picklers do no longer allocate temporary memory when dumping large
bytes and str objects into a file object. Instead the data is
directly streamed into the underlying file object.

Previously the C implementation would buffer all content and issue a
single call to file.write() at the end of the dump. With protocol 4
this behavior has changed to issue one call to file.write() per frame.

The Python pickler with protocol 4 now dumps each frame content as a
memoryview to an IOBytes instance that is never reused and the
memoryview is no longer released after the call to write. This makes it
possible for the file object to delay access to the memoryview of
previous frames without forcing any additional memory copy as was
already possible with the C pickler.
This commit is contained in:
Olivier Grisel 2018-01-06 16:18:54 +01:00 committed by Serhiy Storchaka
parent 85ac726a40
commit 3cd7c6e6eb
6 changed files with 297 additions and 50 deletions

View file

@ -2279,7 +2279,7 @@ def optimize(p):
if arg > proto:
proto = arg
if pos == 0:
protoheader = p[pos: end_pos]
protoheader = p[pos:end_pos]
else:
opcodes.append((pos, end_pos))
else:
@ -2295,6 +2295,7 @@ def optimize(p):
pickler.framer.start_framing()
idx = 0
for op, arg in opcodes:
frameless = False
if op is put:
if arg not in newids:
continue
@ -2305,8 +2306,12 @@ def optimize(p):
data = pickler.get(newids[arg])
else:
data = p[op:arg]
pickler.framer.commit_frame()
pickler.write(data)
frameless = len(data) > pickler.framer._FRAME_SIZE_TARGET
pickler.framer.commit_frame(force=frameless)
if frameless:
pickler.framer.file_write(data)
else:
pickler.write(data)
pickler.framer.end_framing()
return out.getvalue()