[3.11] gh-135034: Normalize link targets in tarfile, add os.path.realpath(strict='allow_missing') (GH-135037) (GH-135068)

Addresses CVEs 2024-12718, 2025-4138, 2025-4330, and 2025-4517.
(cherry picked from commit 3612d8f517)
(cherry picked from commit c358142cab)

Co-authored-by: Łukasz Langa <lukasz@langa.pl>
Signed-off-by: Łukasz Langa <lukasz@langa.pl>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Seth Michael Larson <seth@python.org>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
This commit is contained in:
T. Wouters 2025-06-03 16:58:39 +02:00 committed by GitHub
parent 2c6ca1a9ad
commit 4633f3f497
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
11 changed files with 1020 additions and 141 deletions

View file

@ -352,10 +352,26 @@ the :mod:`glob` module.)
links encountered in the path (if they are supported by the operating
system).
If a path doesn't exist or a symlink loop is encountered, and *strict* is
``True``, :exc:`OSError` is raised. If *strict* is ``False``, the path is
resolved as far as possible and any remainder is appended without checking
whether it exists.
By default, the path is evaluated up to the first component that does not
exist, is a symlink loop, or whose evaluation raises :exc:`OSError`.
All such components are appended unchanged to the existing part of the path.
Some errors that are handled this way include "access denied", "not a
directory", or "bad argument to internal function". Thus, the
resulting path may be missing or inaccessible, may still contain
links or loops, and may traverse non-directories.
This behavior can be modified by keyword arguments:
If *strict* is ``True``, the first error encountered when evaluating the path is
re-raised.
In particular, :exc:`FileNotFoundError` is raised if *path* does not exist,
or another :exc:`OSError` if it is otherwise inaccessible.
If *strict* is :py:data:`os.path.ALLOW_MISSING`, errors other than
:exc:`FileNotFoundError` are re-raised (as with ``strict=True``).
Thus, the returned path will not contain any symbolic links, but the named
file and some of its parent directories may be missing.
.. note::
This function emulates the operating system's procedure for making a path
@ -374,6 +390,15 @@ the :mod:`glob` module.)
.. versionchanged:: 3.10
The *strict* parameter was added.
.. versionchanged:: next
The :py:data:`~os.path.ALLOW_MISSING` value for the *strict* parameter
was added.
.. data:: ALLOW_MISSING
Special value used for the *strict* argument in :func:`realpath`.
.. versionadded:: next
.. function:: relpath(path, start=os.curdir)

View file

@ -239,6 +239,15 @@ The :mod:`tarfile` module defines the following exceptions:
Raised to refuse extracting a symbolic link pointing outside the destination
directory.
.. exception:: LinkFallbackError
Raised to refuse emulating a link (hard or symbolic) by extracting another
archive member, when that member would be rejected by the filter location.
The exception that was raised to reject the replacement member is available
as :attr:`!BaseException.__context__`.
.. versionadded:: next
The following constants are available at the module level:
@ -1037,6 +1046,12 @@ reused in custom filters:
Implements the ``'data'`` filter.
In addition to what ``tar_filter`` does:
- Normalize link targets (:attr:`TarInfo.linkname`) using
:func:`os.path.normpath`.
Note that this removes internal ``..`` components, which may change the
meaning of the link if the path in :attr:`!TarInfo.linkname` traverses
symbolic links.
- :ref:`Refuse <tarfile-extraction-refuse>` to extract links (hard or soft)
that link to absolute paths, or ones that link outside the destination.
@ -1065,6 +1080,10 @@ reused in custom filters:
Return the modified ``TarInfo`` member.
.. versionchanged:: next
Link targets are now normalized.
.. _tarfile-extraction-refuse:
@ -1091,6 +1110,7 @@ Here is an incomplete list of things to consider:
* Extract to a :func:`new temporary directory <tempfile.mkdtemp>`
to prevent e.g. exploiting pre-existing links, and to make it easier to
clean up after a failed extraction.
* Disallow symbolic links if you do not need the functionality.
* When working with untrusted data, use external (e.g. OS-level) limits on
disk, memory and CPU usage.
* Check filenames against an allow-list of characters