Issue #11224: Improved sparse file read support (r85916) introduced a

regression in _FileInFile which is used in file-like objects returned by TarFile.extractfile(). The inefficient design of the _FileInFile.read() method causes various dramatic side-effects and errors: - The data segment of a file member is read completely into memory every(!) time a small block is accessed. This is not only slow but may cause unexpected MemoryErrors with very large files. - Reading members from compressed tar archives is even slower because of the excessive backwards seeking which is done when the same data segment is read over and over again. - As a backwards seek on a TarFile opened in stream mode is not possible, using extractfile() fails with a StreamError.
2025-09-26 10:19:53 +00:00 · 2011-02-23 11:42:22 +00:00 · 2011-02-23 11:42:22 +00:00 · dd071045e7
commit dd071045e7
parent 3eeee83391
3 changed files with 22 additions and 3 deletions
--- a/Lib/test/test_tarfile.py
+++ b/Lib/test/test_tarfile.py
@ -419,6 +419,22 @@ class StreamReadTest(CommonReadTest):

    mode="r|"

+    def test_read_through(self):
+        # Issue #11224: A poorly designed _FileInFile.read() method
+        # caused seeking errors with stream tar files.
+        for tarinfo in self.tar:
+            if not tarinfo.isreg():
+                continue
+            fobj = self.tar.extractfile(tarinfo)
+            while True:
+                try:
+                    buf = fobj.read(512)
+                except tarfile.StreamError:
+                    self.fail("simple read-through using TarFile.extractfile() failed")
+                if not buf:
+                    break
+            fobj.close()
+
    def test_fileobj_regular_file(self):
        tarinfo = self.tar.next() # get "regtype" (can't use getmember)
        fobj = self.tar.extractfile(tarinfo)