GH-72904: Add glob.translate() function (#106703)

Add `glob.translate()` function that converts a pathname with shell wildcards to a regular expression. The regular expression is used by pathlib to implement `match()` and `glob()`.

This function differs from `fnmatch.translate()` in that wildcards do not match path separators by default, and that a `*` pattern segment matches precisely one path segment. When *recursive* is set to true, `**` pattern segments match any number of path segments, and `**` cannot appear outside its own segment.

In pathlib, this change speeds up directory walking (because `_make_child_relpath()` does less work), makes path objects smaller (they don't need a `_lines` slot), and removes the need for some gnarly code.

Co-authored-by: Jason R. Coombs <jaraco@jaraco.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
This commit is contained in:
Barney Gale 2023-11-13 17:15:56 +00:00 committed by GitHub
parent babb787047
commit cf67ebfb31
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
7 changed files with 229 additions and 106 deletions

View file

@ -145,6 +145,45 @@ default. For example, consider a directory containing :file:`card.gif` and
>>> glob.glob('.c*')
['.card.gif']
.. function:: translate(pathname, *, recursive=False, include_hidden=False, seps=None)
Convert the given path specification to a regular expression for use with
:func:`re.match`. The path specification can contain shell-style wildcards.
For example:
>>> import glob, re
>>>
>>> regex = glob.translate('**/*.txt', recursive=True, include_hidden=True)
>>> regex
'(?s:(?:.+/)?[^/]*\\.txt)\\Z'
>>> reobj = re.compile(regex)
>>> reobj.match('foo/bar/baz.txt')
<re.Match object; span=(0, 15), match='foo/bar/baz.txt'>
Path separators and segments are meaningful to this function, unlike
:func:`fnmatch.translate`. By default wildcards do not match path
separators, and ``*`` pattern segments match precisely one path segment.
If *recursive* is true, the pattern segment "``**``" will match any number
of path segments. If "``**``" occurs in any position other than a full
pattern segment, :exc:`ValueError` is raised.
If *include_hidden* is true, wildcards can match path segments that start
with a dot (``.``).
A sequence of path separators may be supplied to the *seps* argument. If
not given, :data:`os.sep` and :data:`~os.altsep` (if available) are used.
.. seealso::
:meth:`pathlib.PurePath.match` and :meth:`pathlib.Path.glob` methods,
which call this function to implement pattern matching and globbing.
.. versionadded:: 3.13
.. seealso::
Module :mod:`fnmatch`

View file

@ -183,6 +183,13 @@ doctest
:attr:`doctest.TestResults.skipped` attributes.
(Contributed by Victor Stinner in :gh:`108794`.)
glob
----
* Add :func:`glob.translate` function that converts a path specification with
shell-style wildcards to a regular expression.
(Contributed by Barney Gale in :gh:`72904`.)
io
--