GH-72904: Add glob.translate() function (#106703)

Add `glob.translate()` function that converts a pathname with shell wildcards to a regular expression. The regular expression is used by pathlib to implement `match()` and `glob()`.

This function differs from `fnmatch.translate()` in that wildcards do not match path separators by default, and that a `*` pattern segment matches precisely one path segment. When *recursive* is set to true, `**` pattern segments match any number of path segments, and `**` cannot appear outside its own segment.

In pathlib, this change speeds up directory walking (because `_make_child_relpath()` does less work), makes path objects smaller (they don't need a `_lines` slot), and removes the need for some gnarly code.

Co-authored-by: Jason R. Coombs <jaraco@jaraco.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
This commit is contained in:
Barney Gale 2023-11-13 17:15:56 +00:00 committed by GitHub
parent babb787047
commit cf67ebfb31
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
7 changed files with 229 additions and 106 deletions

View file

@ -78,6 +78,11 @@ def translate(pat):
"""
STAR = object()
parts = _translate(pat, STAR, '.')
return _join_translated_parts(parts, STAR)
def _translate(pat, STAR, QUESTION_MARK):
res = []
add = res.append
i, n = 0, len(pat)
@ -89,7 +94,7 @@ def translate(pat):
if (not res) or res[-1] is not STAR:
add(STAR)
elif c == '?':
add('.')
add(QUESTION_MARK)
elif c == '[':
j = i
if j < n and pat[j] == '!':
@ -146,9 +151,11 @@ def translate(pat):
else:
add(re.escape(c))
assert i == n
return res
def _join_translated_parts(inp, STAR):
# Deal with STARs.
inp = res
res = []
add = res.append
i, n = 0, len(inp)