bpo-33671: efficient zero-copy for shutil.copy* functions (Linux, OSX and Win) (#7160)

* have shutil.copyfileobj use sendfile() if possible

* refactoring: use ctx manager

* add test with non-regular file obj

* emulate case where file size can't be determined

* reference _copyfileobj_sendfile directly

* add test for offset() at certain position

* add test for empty file

* add test for non regular file dst

* small refactoring

* leave copyfileobj() alone in order to not introduce any incompatibility

* minor refactoring

* remove old test

* update docstring

* update docstring; rename exception class

* detect platforms which only support file to socket zero copy

* don't run test on platforms where file-to-file zero copy is not supported

* use tempfiles

* reset verbosity

* add test for smaller chunks

* add big file size test

* add comment

* update doc

* update whatsnew doc

* update doc

* catch Exception

* remove unused import

* add test case for error on second sendfile() call

* turn docstring into comment

* add one more test

* update comment

* add Misc/NEWS entry

* get rid of COPY_BUFSIZE; it belongs to another PR

* update doc

* expose posix._fcopyfile() for OSX

* merge from linux branch

* merge from linux branch

* expose fcopyfile

* arg clinic for the win implementation

* convert path type to path_t

* expose CopyFileW

* fix windows tests

* release GIL

* minor refactoring

* update doc

* update comment

* update docstrings

* rename functions

* rename test classes

* update doc

* update doc

* update docstrings and comments

* avoid do import nt|posix modules if unnecessary

* set nt|posix modules to None if not available

* micro speedup

* update description

* add doc note

* use better wording in doc

* rename function using 'fastcopy' prefix instead of 'zerocopy'

* use :ref: in rst doc

* change wording in doc

* add test to make sure sendfile() doesn't get called aymore in case it doesn't support file to file copies

* move CopyFileW in _winapi and actually expose CopyFileExW instead

* fix line endings

* add tests for mode bits

* add docstring

* remove test file mode class; let's keep it for later when Istart addressing OSX fcopyfile() specific copies

* update doc to reflect new changes

* update doc

* adjust tests on win

* fix argument clinic error

* update doc

* OSX: expose copyfile(3) instead of fcopyfile(3); also expose flags arg to python

* osx / copyfile: use path_t instead of char

* do not set dst name in the OSError exception in order to remain consistent with platforms which cannot do that (e.g. linux)

* add same file test

* add test for same file

* have osx copyfile() pre-emptively check if src and dst are the same, otherwise it will return immedialtey and src file content gets deleted

* turn PermissionError into appropriate SameFileError

* expose ERROR_SHARING_VIOLATION in order to raise more appropriate SameFileError

* honour follow_symlinks arg when using CopyFileEx

* update Misc/NEWS

* expose CreateDirectoryEx mock

* change C type

* CreateDirectoryExW actual implementation

* provide specific makedirs() implementation for win

* fix typo

* skeleton for SetNamedSecurityInfo

* get security info for src path

* finally set security attrs

* add unit tests

* mimick os.makedirs() behavior and raise if dst dir exists

* set 2 paths for OSError object

* set 2 paths for OSError object

* expand windows test

* in case of exception on os.sendfile() set filename and filename2 exception attributes

* set 2 filenames (src, dst) for OSError in case copyfile() fails on OSX

* update doc

* do not use CreateDirectoryEx() in copytree() if source dir is a symlink (breaks test_copytree_symlink_dir); instead just create a plain dir and remain consistent with POSIX implementation

* use bytearray() and readinto()

* use memoryview() with bytearray()

* refactoring + introduce a new _fastcopy_binfileobj() fun

* remove CopyFileEx and other C wrappers

* remove code related to CopyFileEx

* Recognize binary files in copyfileobj()
...and use fastest _fastcopy_binfileobj() when possible

* set 1MB copy bufsize on win; also add a global _COPY_BUFSIZE variable

* use ctx manager for memoryview()

* update doc

* remove outdated doc

* remove last CopyFileEx remnants

* OSX - use fcopyfile(3) instead of copyfile(3)

...as an extra safety measure: in case src/dst are "exotic" files (non
regular or living on a network fs etc.) we better fail on open() instead
of copyfile(3) as we're not quite sure what's gonna happen in that
case.

* update doc
This commit is contained in:
Giampaolo Rodola 2018-06-12 23:04:50 +02:00 committed by GitHub
parent 33cd058f21
commit 4a172ccc73
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
8 changed files with 595 additions and 19 deletions

View file

@ -51,7 +51,9 @@ Directory and files operations
.. function:: copyfile(src, dst, *, follow_symlinks=True) .. function:: copyfile(src, dst, *, follow_symlinks=True)
Copy the contents (no metadata) of the file named *src* to a file named Copy the contents (no metadata) of the file named *src* to a file named
*dst* and return *dst*. *src* and *dst* are path names given as strings. *dst* and return *dst* in the most efficient way possible.
*src* and *dst* are path names given as strings.
*dst* must be the complete target file name; look at :func:`shutil.copy` *dst* must be the complete target file name; look at :func:`shutil.copy`
for a copy that accepts a target directory path. If *src* and *dst* for a copy that accepts a target directory path. If *src* and *dst*
specify the same file, :exc:`SameFileError` is raised. specify the same file, :exc:`SameFileError` is raised.
@ -74,6 +76,10 @@ Directory and files operations
Raise :exc:`SameFileError` instead of :exc:`Error`. Since the former is Raise :exc:`SameFileError` instead of :exc:`Error`. Since the former is
a subclass of the latter, this change is backward compatible. a subclass of the latter, this change is backward compatible.
.. versionchanged:: 3.8
Platform-specific fast-copy syscalls may be used internally in order to
copy the file more efficiently. See
:ref:`shutil-platform-dependent-efficient-copy-operations` section.
.. exception:: SameFileError .. exception:: SameFileError
@ -163,6 +169,11 @@ Directory and files operations
Added *follow_symlinks* argument. Added *follow_symlinks* argument.
Now returns path to the newly created file. Now returns path to the newly created file.
.. versionchanged:: 3.8
Platform-specific fast-copy syscalls may be used internally in order to
copy the file more efficiently. See
:ref:`shutil-platform-dependent-efficient-copy-operations` section.
.. function:: copy2(src, dst, *, follow_symlinks=True) .. function:: copy2(src, dst, *, follow_symlinks=True)
Identical to :func:`~shutil.copy` except that :func:`copy2` Identical to :func:`~shutil.copy` except that :func:`copy2`
@ -185,6 +196,11 @@ Directory and files operations
file system attributes too (currently Linux only). file system attributes too (currently Linux only).
Now returns path to the newly created file. Now returns path to the newly created file.
.. versionchanged:: 3.8
Platform-specific fast-copy syscalls may be used internally in order to
copy the file more efficiently. See
:ref:`shutil-platform-dependent-efficient-copy-operations` section.
.. function:: ignore_patterns(\*patterns) .. function:: ignore_patterns(\*patterns)
This factory function creates a function that can be used as a callable for This factory function creates a function that can be used as a callable for
@ -241,6 +257,10 @@ Directory and files operations
Added the *ignore_dangling_symlinks* argument to silent dangling symlinks Added the *ignore_dangling_symlinks* argument to silent dangling symlinks
errors when *symlinks* is false. errors when *symlinks* is false.
.. versionchanged:: 3.8
Platform-specific fast-copy syscalls may be used internally in order to
copy the file more efficiently. See
:ref:`shutil-platform-dependent-efficient-copy-operations` section.
.. function:: rmtree(path, ignore_errors=False, onerror=None) .. function:: rmtree(path, ignore_errors=False, onerror=None)
@ -314,6 +334,11 @@ Directory and files operations
.. versionchanged:: 3.5 .. versionchanged:: 3.5
Added the *copy_function* keyword argument. Added the *copy_function* keyword argument.
.. versionchanged:: 3.8
Platform-specific fast-copy syscalls may be used internally in order to
copy the file more efficiently. See
:ref:`shutil-platform-dependent-efficient-copy-operations` section.
.. function:: disk_usage(path) .. function:: disk_usage(path)
Return disk usage statistics about the given path as a :term:`named tuple` Return disk usage statistics about the given path as a :term:`named tuple`
@ -370,6 +395,28 @@ Directory and files operations
operation. For :func:`copytree`, the exception argument is a list of 3-tuples operation. For :func:`copytree`, the exception argument is a list of 3-tuples
(*srcname*, *dstname*, *exception*). (*srcname*, *dstname*, *exception*).
.. _shutil-platform-dependent-efficient-copy-operations:
Platform-dependent efficient copy operations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Starting from Python 3.8 all functions involving a file copy (:func:`copyfile`,
:func:`copy`, :func:`copy2`, :func:`copytree`, and :func:`move`) may use
platform-specific "fast-copy" syscalls in order to copy the file more
efficiently (see :issue:`33671`).
"fast-copy" means that the copying operation occurs within the kernel, avoiding
the use of userspace buffers in Python as in "``outfd.write(infd.read())``".
On OSX `fcopyfile`_ is used to copy the file content (not metadata).
On Linux, Solaris and other POSIX platforms where :func:`os.sendfile` supports
copies between 2 regular file descriptors :func:`os.sendfile` is used.
If the fast-copy operation fails and no data was written in the destination
file then shutil will silently fallback on using less efficient
:func:`copyfileobj` function internally.
.. versionchanged:: 3.8
.. _shutil-copytree-example: .. _shutil-copytree-example:
@ -654,6 +701,8 @@ Querying the size of the output terminal
.. versionadded:: 3.3 .. versionadded:: 3.3
.. _`fcopyfile`:
http://www.manpagez.com/man/3/copyfile/
.. _`Other Environment Variables`: .. _`Other Environment Variables`:
http://pubs.opengroup.org/onlinepubs/7908799/xbd/envvar.html#tag_002_003 http://pubs.opengroup.org/onlinepubs/7908799/xbd/envvar.html#tag_002_003

View file

@ -90,10 +90,27 @@ New Modules
Improved Modules Improved Modules
================ ================
Optimizations Optimizations
============= =============
* :func:`shutil.copyfile`, :func:`shutil.copy`, :func:`shutil.copy2`,
:func:`shutil.copytree` and :func:`shutil.move` use platform-specific
"fast-copy" syscalls on Linux, OSX and Solaris in order to copy the file more
efficiently.
"fast-copy" means that the copying operation occurs within the kernel,
avoiding the use of userspace buffers in Python as in
"``outfd.write(infd.read())``".
All other platforms not using such technique will rely on a faster
:func:`shutil.copyfile` implementation using :func:`memoryview`,
:class:`bytearray` and
:meth:`BufferedIOBase.readinto() <io.BufferedIOBase.readinto>`.
Finally, :func:`shutil.copyfile` default buffer size on Windows was increased
from 16KB to 1MB.
The speedup for copying a 512MB file within the same partition is about +26%
on Linux, +50% on OSX and +38% on Windows. Also, much less CPU cycles are
consumed.
(Contributed by Giampaolo Rodola' in :issue:`25427`.)
* The default protocol in the :mod:`pickle` module is now Protocol 4, * The default protocol in the :mod:`pickle` module is now Protocol 4,
first introduced in Python 3.4. It offers better performance and smaller first introduced in Python 3.4. It offers better performance and smaller
size compared to Protocol 3 available since Python 3.0. size compared to Protocol 3 available since Python 3.0.

View file

@ -10,6 +10,7 @@ import stat
import fnmatch import fnmatch
import collections import collections
import errno import errno
import io
try: try:
import zlib import zlib
@ -42,6 +43,16 @@ try:
except ImportError: except ImportError:
getgrnam = None getgrnam = None
posix = nt = None
if os.name == 'posix':
import posix
elif os.name == 'nt':
import nt
COPY_BUFSIZE = 1024 * 1024 if os.name == 'nt' else 16 * 1024
_HAS_SENDFILE = posix and hasattr(os, "sendfile")
_HAS_FCOPYFILE = posix and hasattr(posix, "_fcopyfile") # OSX
__all__ = ["copyfileobj", "copyfile", "copymode", "copystat", "copy", "copy2", __all__ = ["copyfileobj", "copyfile", "copymode", "copystat", "copy", "copy2",
"copytree", "move", "rmtree", "Error", "SpecialFileError", "copytree", "move", "rmtree", "Error", "SpecialFileError",
"ExecError", "make_archive", "get_archive_formats", "ExecError", "make_archive", "get_archive_formats",
@ -72,14 +83,124 @@ class RegistryError(Exception):
"""Raised when a registry operation with the archiving """Raised when a registry operation with the archiving
and unpacking registries fails""" and unpacking registries fails"""
class _GiveupOnFastCopy(Exception):
"""Raised as a signal to fallback on using raw read()/write()
file copy when fast-copy functions fail to do so.
"""
def copyfileobj(fsrc, fdst, length=16*1024): def _fastcopy_osx(fsrc, fdst, flags):
"""Copy a regular file content or metadata by using high-performance
fcopyfile(3) syscall (OSX).
"""
try:
infd = fsrc.fileno()
outfd = fdst.fileno()
except Exception as err:
raise _GiveupOnFastCopy(err) # not a regular file
try:
posix._fcopyfile(infd, outfd, flags)
except OSError as err:
err.filename = fsrc.name
err.filename2 = fdst.name
if err.errno in {errno.EINVAL, errno.ENOTSUP}:
raise _GiveupOnFastCopy(err)
else:
raise err from None
def _fastcopy_sendfile(fsrc, fdst):
"""Copy data from one regular mmap-like fd to another by using
high-performance sendfile(2) syscall.
This should work on Linux >= 2.6.33 and Solaris only.
"""
# Note: copyfileobj() is left alone in order to not introduce any
# unexpected breakage. Possible risks by using zero-copy calls
# in copyfileobj() are:
# - fdst cannot be open in "a"(ppend) mode
# - fsrc and fdst may be open in "t"(ext) mode
# - fsrc may be a BufferedReader (which hides unread data in a buffer),
# GzipFile (which decompresses data), HTTPResponse (which decodes
# chunks).
# - possibly others (e.g. encrypted fs/partition?)
global _HAS_SENDFILE
try:
infd = fsrc.fileno()
outfd = fdst.fileno()
except Exception as err:
raise _GiveupOnFastCopy(err) # not a regular file
# Hopefully the whole file will be copied in a single call.
# sendfile() is called in a loop 'till EOF is reached (0 return)
# so a bufsize smaller or bigger than the actual file size
# should not make any difference, also in case the file content
# changes while being copied.
try:
blocksize = max(os.fstat(infd).st_size, 2 ** 23) # min 8MB
except Exception:
blocksize = 2 ** 27 # 128MB
offset = 0
while True:
try:
sent = os.sendfile(outfd, infd, offset, blocksize)
except OSError as err:
# ...in oder to have a more informative exception.
err.filename = fsrc.name
err.filename2 = fdst.name
if err.errno == errno.ENOTSOCK:
# sendfile() on this platform (probably Linux < 2.6.33)
# does not support copies between regular files (only
# sockets).
_HAS_SENDFILE = False
raise _GiveupOnFastCopy(err)
if err.errno == errno.ENOSPC: # filesystem is full
raise err from None
# Give up on first call and if no data was copied.
if offset == 0 and os.lseek(outfd, 0, os.SEEK_CUR) == 0:
raise _GiveupOnFastCopy(err)
raise err
else:
if sent == 0:
break # EOF
offset += sent
def _copybinfileobj(fsrc, fdst, length=COPY_BUFSIZE):
"""Copy 2 regular file objects open in binary mode."""
# Localize variable access to minimize overhead.
fsrc_readinto = fsrc.readinto
fdst_write = fdst.write
with memoryview(bytearray(length)) as mv:
while True:
n = fsrc_readinto(mv)
if not n:
break
elif n < length:
fdst_write(mv[:n])
else:
fdst_write(mv)
def _is_binary_files_pair(fsrc, fdst):
return hasattr(fsrc, 'readinto') and \
isinstance(fsrc, io.BytesIO) or 'b' in getattr(fsrc, 'mode', '') and \
isinstance(fdst, io.BytesIO) or 'b' in getattr(fdst, 'mode', '')
def copyfileobj(fsrc, fdst, length=COPY_BUFSIZE):
"""copy data from file-like object fsrc to file-like object fdst""" """copy data from file-like object fsrc to file-like object fdst"""
while 1: if _is_binary_files_pair(fsrc, fdst):
buf = fsrc.read(length) _copybinfileobj(fsrc, fdst, length=length)
if not buf: else:
break # Localize variable access to minimize overhead.
fdst.write(buf) fsrc_read = fsrc.read
fdst_write = fdst.write
while 1:
buf = fsrc_read(length)
if not buf:
break
fdst_write(buf)
def _samefile(src, dst): def _samefile(src, dst):
# Macintosh, Unix. # Macintosh, Unix.
@ -117,9 +238,23 @@ def copyfile(src, dst, *, follow_symlinks=True):
if not follow_symlinks and os.path.islink(src): if not follow_symlinks and os.path.islink(src):
os.symlink(os.readlink(src), dst) os.symlink(os.readlink(src), dst)
else: else:
with open(src, 'rb') as fsrc: with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
with open(dst, 'wb') as fdst: if _HAS_SENDFILE:
copyfileobj(fsrc, fdst) try:
_fastcopy_sendfile(fsrc, fdst)
return dst
except _GiveupOnFastCopy:
pass
if _HAS_FCOPYFILE:
try:
_fastcopy_osx(fsrc, fdst, posix._COPYFILE_DATA)
return dst
except _GiveupOnFastCopy:
pass
_copybinfileobj(fsrc, fdst)
return dst return dst
def copymode(src, dst, *, follow_symlinks=True): def copymode(src, dst, *, follow_symlinks=True):
@ -244,13 +379,12 @@ def copy(src, dst, *, follow_symlinks=True):
def copy2(src, dst, *, follow_symlinks=True): def copy2(src, dst, *, follow_symlinks=True):
"""Copy data and all stat info ("cp -p src dst"). Return the file's """Copy data and all stat info ("cp -p src dst"). Return the file's
destination." destination.
The destination may be a directory. The destination may be a directory.
If follow_symlinks is false, symlinks won't be followed. This If follow_symlinks is false, symlinks won't be followed. This
resembles GNU's "cp -P src dst". resembles GNU's "cp -P src dst".
""" """
if os.path.isdir(dst): if os.path.isdir(dst):
dst = os.path.join(dst, os.path.basename(src)) dst = os.path.join(dst, os.path.basename(src))
@ -1015,7 +1149,6 @@ if hasattr(os, 'statvfs'):
elif os.name == 'nt': elif os.name == 'nt':
import nt
__all__.append('disk_usage') __all__.append('disk_usage')
_ntuple_diskusage = collections.namedtuple('usage', 'total used free') _ntuple_diskusage = collections.namedtuple('usage', 'total used free')

View file

@ -12,20 +12,28 @@ import errno
import functools import functools
import pathlib import pathlib
import subprocess import subprocess
import random
import string
import contextlib
import io
from shutil import (make_archive, from shutil import (make_archive,
register_archive_format, unregister_archive_format, register_archive_format, unregister_archive_format,
get_archive_formats, Error, unpack_archive, get_archive_formats, Error, unpack_archive,
register_unpack_format, RegistryError, register_unpack_format, RegistryError,
unregister_unpack_format, get_unpack_formats, unregister_unpack_format, get_unpack_formats,
SameFileError) SameFileError, _GiveupOnFastCopy)
import tarfile import tarfile
import zipfile import zipfile
try:
import posix
except ImportError:
posix = None
from test import support from test import support
from test.support import TESTFN, FakePath from test.support import TESTFN, FakePath
TESTFN2 = TESTFN + "2" TESTFN2 = TESTFN + "2"
OSX = sys.platform.startswith("darwin")
try: try:
import grp import grp
import pwd import pwd
@ -60,6 +68,24 @@ def write_file(path, content, binary=False):
with open(path, 'wb' if binary else 'w') as fp: with open(path, 'wb' if binary else 'w') as fp:
fp.write(content) fp.write(content)
def write_test_file(path, size):
"""Create a test file with an arbitrary size and random text content."""
def chunks(total, step):
assert total >= step
while total > step:
yield step
total -= step
if total:
yield total
bufsize = min(size, 8192)
chunk = b"".join([random.choice(string.ascii_letters).encode()
for i in range(bufsize)])
with open(path, 'wb') as f:
for csize in chunks(size, bufsize):
f.write(chunk)
assert os.path.getsize(path) == size
def read_file(path, binary=False): def read_file(path, binary=False):
"""Return contents from a file located at *path*. """Return contents from a file located at *path*.
@ -84,6 +110,37 @@ def rlistdir(path):
res.append(name) res.append(name)
return res return res
def supports_file2file_sendfile():
# ...apparently Linux and Solaris are the only ones
if not hasattr(os, "sendfile"):
return False
srcname = None
dstname = None
try:
with tempfile.NamedTemporaryFile("wb", delete=False) as f:
srcname = f.name
f.write(b"0123456789")
with open(srcname, "rb") as src:
with tempfile.NamedTemporaryFile("wb", delete=False) as dst:
dstname = f.name
infd = src.fileno()
outfd = dst.fileno()
try:
os.sendfile(outfd, infd, 0, 2)
except OSError:
return False
else:
return True
finally:
if srcname is not None:
support.unlink(srcname)
if dstname is not None:
support.unlink(dstname)
SUPPORTS_SENDFILE = supports_file2file_sendfile()
class TestShutil(unittest.TestCase): class TestShutil(unittest.TestCase):
@ -1401,6 +1458,8 @@ class TestShutil(unittest.TestCase):
self.assertRaises(SameFileError, shutil.copyfile, src_file, src_file) self.assertRaises(SameFileError, shutil.copyfile, src_file, src_file)
# But Error should work too, to stay backward compatible. # But Error should work too, to stay backward compatible.
self.assertRaises(Error, shutil.copyfile, src_file, src_file) self.assertRaises(Error, shutil.copyfile, src_file, src_file)
# Make sure file is not corrupted.
self.assertEqual(read_file(src_file), 'foo')
def test_copytree_return_value(self): def test_copytree_return_value(self):
# copytree returns its destination path. # copytree returns its destination path.
@ -1749,6 +1808,7 @@ class TestCopyFile(unittest.TestCase):
self.assertRaises(OSError, shutil.copyfile, 'srcfile', 'destfile') self.assertRaises(OSError, shutil.copyfile, 'srcfile', 'destfile')
@unittest.skipIf(OSX, "skipped on OSX")
def test_w_dest_open_fails(self): def test_w_dest_open_fails(self):
srcfile = self.Faux() srcfile = self.Faux()
@ -1768,6 +1828,7 @@ class TestCopyFile(unittest.TestCase):
self.assertEqual(srcfile._exited_with[1].args, self.assertEqual(srcfile._exited_with[1].args,
('Cannot open "destfile"',)) ('Cannot open "destfile"',))
@unittest.skipIf(OSX, "skipped on OSX")
def test_w_dest_close_fails(self): def test_w_dest_close_fails(self):
srcfile = self.Faux() srcfile = self.Faux()
@ -1790,6 +1851,7 @@ class TestCopyFile(unittest.TestCase):
self.assertEqual(srcfile._exited_with[1].args, self.assertEqual(srcfile._exited_with[1].args,
('Cannot close',)) ('Cannot close',))
@unittest.skipIf(OSX, "skipped on OSX")
def test_w_source_close_fails(self): def test_w_source_close_fails(self):
srcfile = self.Faux(True) srcfile = self.Faux(True)
@ -1829,6 +1891,234 @@ class TestCopyFile(unittest.TestCase):
finally: finally:
os.rmdir(dst_dir) os.rmdir(dst_dir)
class _ZeroCopyFileTest(object):
"""Tests common to all zero-copy APIs."""
FILESIZE = (10 * 1024 * 1024) # 10 MiB
FILEDATA = b""
PATCHPOINT = ""
@classmethod
def setUpClass(cls):
write_test_file(TESTFN, cls.FILESIZE)
with open(TESTFN, 'rb') as f:
cls.FILEDATA = f.read()
assert len(cls.FILEDATA) == cls.FILESIZE
@classmethod
def tearDownClass(cls):
support.unlink(TESTFN)
def tearDown(self):
support.unlink(TESTFN2)
@contextlib.contextmanager
def get_files(self):
with open(TESTFN, "rb") as src:
with open(TESTFN2, "wb") as dst:
yield (src, dst)
def zerocopy_fun(self, *args, **kwargs):
raise NotImplementedError("must be implemented in subclass")
def reset(self):
self.tearDown()
self.tearDownClass()
self.setUpClass()
self.setUp()
# ---
def test_regular_copy(self):
with self.get_files() as (src, dst):
self.zerocopy_fun(src, dst)
self.assertEqual(read_file(TESTFN2, binary=True), self.FILEDATA)
# Make sure the fallback function is not called.
with self.get_files() as (src, dst):
with unittest.mock.patch('shutil.copyfileobj') as m:
shutil.copyfile(TESTFN, TESTFN2)
assert not m.called
def test_same_file(self):
self.addCleanup(self.reset)
with self.get_files() as (src, dst):
with self.assertRaises(Exception):
self.zerocopy_fun(src, src)
# Make sure src file is not corrupted.
self.assertEqual(read_file(TESTFN, binary=True), self.FILEDATA)
def test_non_existent_src(self):
name = tempfile.mktemp()
with self.assertRaises(FileNotFoundError) as cm:
shutil.copyfile(name, "new")
self.assertEqual(cm.exception.filename, name)
def test_empty_file(self):
srcname = TESTFN + 'src'
dstname = TESTFN + 'dst'
self.addCleanup(lambda: support.unlink(srcname))
self.addCleanup(lambda: support.unlink(dstname))
with open(srcname, "wb"):
pass
with open(srcname, "rb") as src:
with open(dstname, "wb") as dst:
self.zerocopy_fun(src, dst)
self.assertEqual(read_file(dstname, binary=True), b"")
def test_unhandled_exception(self):
with unittest.mock.patch(self.PATCHPOINT,
side_effect=ZeroDivisionError):
self.assertRaises(ZeroDivisionError,
shutil.copyfile, TESTFN, TESTFN2)
def test_exception_on_first_call(self):
# Emulate a case where the first call to the zero-copy
# function raises an exception in which case the function is
# supposed to give up immediately.
with unittest.mock.patch(self.PATCHPOINT,
side_effect=OSError(errno.EINVAL, "yo")):
with self.get_files() as (src, dst):
with self.assertRaises(_GiveupOnFastCopy):
self.zerocopy_fun(src, dst)
def test_filesystem_full(self):
# Emulate a case where filesystem is full and sendfile() fails
# on first call.
with unittest.mock.patch(self.PATCHPOINT,
side_effect=OSError(errno.ENOSPC, "yo")):
with self.get_files() as (src, dst):
self.assertRaises(OSError, self.zerocopy_fun, src, dst)
@unittest.skipIf(not SUPPORTS_SENDFILE, 'os.sendfile() not supported')
class TestZeroCopySendfile(_ZeroCopyFileTest, unittest.TestCase):
PATCHPOINT = "os.sendfile"
def zerocopy_fun(self, fsrc, fdst):
return shutil._fastcopy_sendfile(fsrc, fdst)
def test_non_regular_file_src(self):
with io.BytesIO(self.FILEDATA) as src:
with open(TESTFN2, "wb") as dst:
with self.assertRaises(_GiveupOnFastCopy):
self.zerocopy_fun(src, dst)
shutil.copyfileobj(src, dst)
self.assertEqual(read_file(TESTFN2, binary=True), self.FILEDATA)
def test_non_regular_file_dst(self):
with open(TESTFN, "rb") as src:
with io.BytesIO() as dst:
with self.assertRaises(_GiveupOnFastCopy):
self.zerocopy_fun(src, dst)
shutil.copyfileobj(src, dst)
dst.seek(0)
self.assertEqual(dst.read(), self.FILEDATA)
def test_exception_on_second_call(self):
def sendfile(*args, **kwargs):
if not flag:
flag.append(None)
return orig_sendfile(*args, **kwargs)
else:
raise OSError(errno.EBADF, "yo")
flag = []
orig_sendfile = os.sendfile
with unittest.mock.patch('os.sendfile', create=True,
side_effect=sendfile):
with self.get_files() as (src, dst):
with self.assertRaises(OSError) as cm:
shutil._fastcopy_sendfile(src, dst)
assert flag
self.assertEqual(cm.exception.errno, errno.EBADF)
def test_cant_get_size(self):
# Emulate a case where src file size cannot be determined.
# Internally bufsize will be set to a small value and
# sendfile() will be called repeatedly.
with unittest.mock.patch('os.fstat', side_effect=OSError) as m:
with self.get_files() as (src, dst):
shutil._fastcopy_sendfile(src, dst)
assert m.called
self.assertEqual(read_file(TESTFN2, binary=True), self.FILEDATA)
def test_small_chunks(self):
# Force internal file size detection to be smaller than the
# actual file size. We want to force sendfile() to be called
# multiple times, also in order to emulate a src fd which gets
# bigger while it is being copied.
mock = unittest.mock.Mock()
mock.st_size = 65536 + 1
with unittest.mock.patch('os.fstat', return_value=mock) as m:
with self.get_files() as (src, dst):
shutil._fastcopy_sendfile(src, dst)
assert m.called
self.assertEqual(read_file(TESTFN2, binary=True), self.FILEDATA)
def test_big_chunk(self):
# Force internal file size detection to be +100MB bigger than
# the actual file size. Make sure sendfile() does not rely on
# file size value except for (maybe) a better throughput /
# performance.
mock = unittest.mock.Mock()
mock.st_size = self.FILESIZE + (100 * 1024 * 1024)
with unittest.mock.patch('os.fstat', return_value=mock) as m:
with self.get_files() as (src, dst):
shutil._fastcopy_sendfile(src, dst)
assert m.called
self.assertEqual(read_file(TESTFN2, binary=True), self.FILEDATA)
def test_blocksize_arg(self):
with unittest.mock.patch('os.sendfile',
side_effect=ZeroDivisionError) as m:
self.assertRaises(ZeroDivisionError,
shutil.copyfile, TESTFN, TESTFN2)
blocksize = m.call_args[0][3]
# Make sure file size and the block size arg passed to
# sendfile() are the same.
self.assertEqual(blocksize, os.path.getsize(TESTFN))
# ...unless we're dealing with a small file.
support.unlink(TESTFN2)
write_file(TESTFN2, b"hello", binary=True)
self.addCleanup(support.unlink, TESTFN2 + '3')
self.assertRaises(ZeroDivisionError,
shutil.copyfile, TESTFN2, TESTFN2 + '3')
blocksize = m.call_args[0][3]
self.assertEqual(blocksize, 2 ** 23)
def test_file2file_not_supported(self):
# Emulate a case where sendfile() only support file->socket
# fds. In such a case copyfile() is supposed to skip the
# fast-copy attempt from then on.
assert shutil._HAS_SENDFILE
try:
with unittest.mock.patch(
self.PATCHPOINT,
side_effect=OSError(errno.ENOTSOCK, "yo")) as m:
with self.get_files() as (src, dst):
with self.assertRaises(_GiveupOnFastCopy):
shutil._fastcopy_sendfile(src, dst)
assert m.called
assert not shutil._HAS_SENDFILE
with unittest.mock.patch(self.PATCHPOINT) as m:
shutil.copyfile(TESTFN, TESTFN2)
assert not m.called
finally:
shutil._HAS_SENDFILE = True
@unittest.skipIf(not OSX, 'OSX only')
class TestZeroCopyOSX(_ZeroCopyFileTest, unittest.TestCase):
PATCHPOINT = "posix._fcopyfile"
def zerocopy_fun(self, src, dst):
return shutil._fastcopy_osx(src, dst, posix._COPYFILE_DATA)
class TermsizeTests(unittest.TestCase): class TermsizeTests(unittest.TestCase):
def test_does_not_crash(self): def test_does_not_crash(self):
"""Check if get_terminal_size() returns a meaningful value. """Check if get_terminal_size() returns a meaningful value.

View file

@ -0,0 +1,11 @@
:func:`shutil.copyfile`, :func:`shutil.copy`, :func:`shutil.copy2`,
:func:`shutil.copytree` and :func:`shutil.move` use platform-specific
fast-copy syscalls on Linux, Solaris and OSX in order to copy the file
more efficiently. All other platforms not using such technique will rely on a
faster :func:`shutil.copyfile` implementation using :func:`memoryview`,
:class:`bytearray` and
:meth:`BufferedIOBase.readinto() <io.BufferedIOBase.readinto>`.
Finally, :func:`shutil.copyfile` default buffer size on Windows was increased
from 16KB to 1MB. The speedup for copying a 512MB file is about +26% on Linux,
+50% on OSX and +38% on Windows. Also, much less CPU cycles are consumed
(Contributed by Giampaolo Rodola' in :issue:`25427`.)

View file

@ -163,6 +163,7 @@ create_converter('LPSECURITY_ATTRIBUTES', '" F_POINTER "')
create_converter('BOOL', 'i') # F_BOOL used previously (always 'i') create_converter('BOOL', 'i') # F_BOOL used previously (always 'i')
create_converter('DWORD', 'k') # F_DWORD is always "k" (which is much shorter) create_converter('DWORD', 'k') # F_DWORD is always "k" (which is much shorter)
create_converter('LPCTSTR', 's') create_converter('LPCTSTR', 's')
create_converter('LPCWSTR', 'u')
create_converter('LPWSTR', 'u') create_converter('LPWSTR', 'u')
create_converter('UINT', 'I') # F_UINT used previously (always 'I') create_converter('UINT', 'I') # F_UINT used previously (always 'I')
@ -186,7 +187,7 @@ class DWORD_return_converter(CReturnConverter):
data.return_conversion.append( data.return_conversion.append(
'return_value = Py_BuildValue("k", _return_value);\n') 'return_value = Py_BuildValue("k", _return_value);\n')
[python start generated code]*/ [python start generated code]*/
/*[python end generated code: output=da39a3ee5e6b4b0d input=4527052fe06e5823]*/ /*[python end generated code: output=da39a3ee5e6b4b0d input=27456f8555228b62]*/
#include "clinic/_winapi.c.h" #include "clinic/_winapi.c.h"

View file

@ -3853,6 +3853,40 @@ exit:
return return_value; return return_value;
} }
#if defined(__APPLE__)
PyDoc_STRVAR(os__fcopyfile__doc__,
"_fcopyfile($module, infd, outfd, flags, /)\n"
"--\n"
"\n"
"Efficiently copy content or metadata of 2 regular file descriptors (OSX).");
#define OS__FCOPYFILE_METHODDEF \
{"_fcopyfile", (PyCFunction)os__fcopyfile, METH_FASTCALL, os__fcopyfile__doc__},
static PyObject *
os__fcopyfile_impl(PyObject *module, int infd, int outfd, int flags);
static PyObject *
os__fcopyfile(PyObject *module, PyObject *const *args, Py_ssize_t nargs)
{
PyObject *return_value = NULL;
int infd;
int outfd;
int flags;
if (!_PyArg_ParseStack(args, nargs, "iii:_fcopyfile",
&infd, &outfd, &flags)) {
goto exit;
}
return_value = os__fcopyfile_impl(module, infd, outfd, flags);
exit:
return return_value;
}
#endif /* defined(__APPLE__) */
PyDoc_STRVAR(os_fstat__doc__, PyDoc_STRVAR(os_fstat__doc__,
"fstat($module, /, fd)\n" "fstat($module, /, fd)\n"
"--\n" "--\n"
@ -6414,6 +6448,10 @@ exit:
#define OS_PREADV_METHODDEF #define OS_PREADV_METHODDEF
#endif /* !defined(OS_PREADV_METHODDEF) */ #endif /* !defined(OS_PREADV_METHODDEF) */
#ifndef OS__FCOPYFILE_METHODDEF
#define OS__FCOPYFILE_METHODDEF
#endif /* !defined(OS__FCOPYFILE_METHODDEF) */
#ifndef OS_PIPE_METHODDEF #ifndef OS_PIPE_METHODDEF
#define OS_PIPE_METHODDEF #define OS_PIPE_METHODDEF
#endif /* !defined(OS_PIPE_METHODDEF) */ #endif /* !defined(OS_PIPE_METHODDEF) */
@ -6589,4 +6627,4 @@ exit:
#ifndef OS_GETRANDOM_METHODDEF #ifndef OS_GETRANDOM_METHODDEF
#define OS_GETRANDOM_METHODDEF #define OS_GETRANDOM_METHODDEF
#endif /* !defined(OS_GETRANDOM_METHODDEF) */ #endif /* !defined(OS_GETRANDOM_METHODDEF) */
/*[clinic end generated code: output=8d3d9dddf254c3c2 input=a9049054013a1b77]*/ /*[clinic end generated code: output=b5d1ec71bc6f0651 input=a9049054013a1b77]*/

View file

@ -97,6 +97,10 @@ corresponding Unix manual entries for more information on calls.");
#include <sys/sendfile.h> #include <sys/sendfile.h>
#endif #endif
#if defined(__APPLE__)
#include <copyfile.h>
#endif
#ifdef HAVE_SCHED_H #ifdef HAVE_SCHED_H
#include <sched.h> #include <sched.h>
#endif #endif
@ -8742,6 +8746,34 @@ done:
#endif /* HAVE_SENDFILE */ #endif /* HAVE_SENDFILE */
#if defined(__APPLE__)
/*[clinic input]
os._fcopyfile
infd: int
outfd: int
flags: int
/
Efficiently copy content or metadata of 2 regular file descriptors (OSX).
[clinic start generated code]*/
static PyObject *
os__fcopyfile_impl(PyObject *module, int infd, int outfd, int flags)
/*[clinic end generated code: output=8e8885c721ec38e3 input=aeb9456804eec879]*/
{
int ret;
Py_BEGIN_ALLOW_THREADS
ret = fcopyfile(infd, outfd, NULL, flags);
Py_END_ALLOW_THREADS
if (ret < 0)
return posix_error();
Py_RETURN_NONE;
}
#endif
/*[clinic input] /*[clinic input]
os.fstat os.fstat
@ -12918,6 +12950,7 @@ static PyMethodDef posix_methods[] = {
OS_UTIME_METHODDEF OS_UTIME_METHODDEF
OS_TIMES_METHODDEF OS_TIMES_METHODDEF
OS__EXIT_METHODDEF OS__EXIT_METHODDEF
OS__FCOPYFILE_METHODDEF
OS_EXECV_METHODDEF OS_EXECV_METHODDEF
OS_EXECVE_METHODDEF OS_EXECVE_METHODDEF
OS_SPAWNV_METHODDEF OS_SPAWNV_METHODDEF
@ -13537,6 +13570,10 @@ all_ins(PyObject *m)
if (PyModule_AddIntMacro(m, GRND_NONBLOCK)) return -1; if (PyModule_AddIntMacro(m, GRND_NONBLOCK)) return -1;
#endif #endif
#if defined(__APPLE__)
if (PyModule_AddIntConstant(m, "_COPYFILE_DATA", COPYFILE_DATA)) return -1;
#endif
return 0; return 0;
} }