bpo-47081: Replace "qualifiers" with "quantifiers" in the re module documentation (GH-32028)

It is a more commonly used term.
This commit is contained in:
Serhiy Storchaka 2022-03-22 11:44:47 +02:00 committed by GitHub
parent 4f97d64c83
commit c6cd3cc93c
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
5 changed files with 21 additions and 21 deletions

View file

@ -230,13 +230,13 @@ while ``+`` requires at least *one* occurrence. To use a similar example,
``ca+t`` will match ``'cat'`` (1 ``'a'``), ``'caaat'`` (3 ``'a'``\ s), but won't ``ca+t`` will match ``'cat'`` (1 ``'a'``), ``'caaat'`` (3 ``'a'``\ s), but won't
match ``'ct'``. match ``'ct'``.
There are two more repeating qualifiers. The question mark character, ``?``, There are two more repeating operators or quantifiers. The question mark character, ``?``,
matches either once or zero times; you can think of it as marking something as matches either once or zero times; you can think of it as marking something as
being optional. For example, ``home-?brew`` matches either ``'homebrew'`` or being optional. For example, ``home-?brew`` matches either ``'homebrew'`` or
``'home-brew'``. ``'home-brew'``.
The most complicated repeated qualifier is ``{m,n}``, where *m* and *n* are The most complicated quantifier is ``{m,n}``, where *m* and *n* are
decimal integers. This qualifier means there must be at least *m* repetitions, decimal integers. This quantifier means there must be at least *m* repetitions,
and at most *n*. For example, ``a/{1,3}b`` will match ``'a/b'``, ``'a//b'``, and and at most *n*. For example, ``a/{1,3}b`` will match ``'a/b'``, ``'a//b'``, and
``'a///b'``. It won't match ``'ab'``, which has no slashes, or ``'a////b'``, which ``'a///b'``. It won't match ``'ab'``, which has no slashes, or ``'a////b'``, which
has four. has four.
@ -245,7 +245,7 @@ You can omit either *m* or *n*; in that case, a reasonable value is assumed for
the missing value. Omitting *m* is interpreted as a lower limit of 0, while the missing value. Omitting *m* is interpreted as a lower limit of 0, while
omitting *n* results in an upper bound of infinity. omitting *n* results in an upper bound of infinity.
Readers of a reductionist bent may notice that the three other qualifiers can Readers of a reductionist bent may notice that the three other quantifiers can
all be expressed using this notation. ``{0,}`` is the same as ``*``, ``{1,}`` all be expressed using this notation. ``{0,}`` is the same as ``*``, ``{1,}``
is equivalent to ``+``, and ``{0,1}`` is the same as ``?``. It's better to use is equivalent to ``+``, and ``{0,1}`` is the same as ``?``. It's better to use
``*``, ``+``, or ``?`` when you can, simply because they're shorter and easier ``*``, ``+``, or ``?`` when you can, simply because they're shorter and easier
@ -803,7 +803,7 @@ which matches the header's value.
Groups are marked by the ``'('``, ``')'`` metacharacters. ``'('`` and ``')'`` Groups are marked by the ``'('``, ``')'`` metacharacters. ``'('`` and ``')'``
have much the same meaning as they do in mathematical expressions; they group have much the same meaning as they do in mathematical expressions; they group
together the expressions contained inside them, and you can repeat the contents together the expressions contained inside them, and you can repeat the contents
of a group with a repeating qualifier, such as ``*``, ``+``, ``?``, or of a group with a quantifier, such as ``*``, ``+``, ``?``, or
``{m,n}``. For example, ``(ab)*`` will match zero or more repetitions of ``{m,n}``. For example, ``(ab)*`` will match zero or more repetitions of
``ab``. :: ``ab``. ::
@ -1326,7 +1326,7 @@ backtrack character by character until it finds a match for the ``>``. The
final match extends from the ``'<'`` in ``'<html>'`` to the ``'>'`` in final match extends from the ``'<'`` in ``'<html>'`` to the ``'>'`` in
``'</title>'``, which isn't what you want. ``'</title>'``, which isn't what you want.
In this case, the solution is to use the non-greedy qualifiers ``*?``, ``+?``, In this case, the solution is to use the non-greedy quantifiers ``*?``, ``+?``,
``??``, or ``{m,n}?``, which match as *little* text as possible. In the above ``??``, or ``{m,n}?``, which match as *little* text as possible. In the above
example, the ``'>'`` is tried immediately after the first ``'<'`` matches, and example, the ``'>'`` is tried immediately after the first ``'<'`` matches, and
when it fails, the engine advances a character at a time, retrying the ``'>'`` when it fails, the engine advances a character at a time, retrying the ``'>'``

View file

@ -87,7 +87,7 @@ Some characters, like ``'|'`` or ``'('``, are special. Special
characters either stand for classes of ordinary characters, or affect characters either stand for classes of ordinary characters, or affect
how the regular expressions around them are interpreted. how the regular expressions around them are interpreted.
Repetition qualifiers (``*``, ``+``, ``?``, ``{m,n}``, etc) cannot be Repetition operators or quantifiers (``*``, ``+``, ``?``, ``{m,n}``, etc) cannot be
directly nested. This avoids ambiguity with the non-greedy modifier suffix directly nested. This avoids ambiguity with the non-greedy modifier suffix
``?``, and with other modifiers in other implementations. To apply a second ``?``, and with other modifiers in other implementations. To apply a second
repetition to an inner repetition, parentheses may be used. For example, repetition to an inner repetition, parentheses may be used. For example,
@ -146,10 +146,10 @@ The special characters are:
single: ??; in regular expressions single: ??; in regular expressions
``*?``, ``+?``, ``??`` ``*?``, ``+?``, ``??``
The ``'*'``, ``'+'``, and ``'?'`` qualifiers are all :dfn:`greedy`; they match The ``'*'``, ``'+'``, and ``'?'`` quantifiers are all :dfn:`greedy`; they match
as much text as possible. Sometimes this behaviour isn't desired; if the RE as much text as possible. Sometimes this behaviour isn't desired; if the RE
``<.*>`` is matched against ``'<a> b <c>'``, it will match the entire ``<.*>`` is matched against ``'<a> b <c>'``, it will match the entire
string, and not just ``'<a>'``. Adding ``?`` after the qualifier makes it string, and not just ``'<a>'``. Adding ``?`` after the quantifier makes it
perform the match in :dfn:`non-greedy` or :dfn:`minimal` fashion; as *few* perform the match in :dfn:`non-greedy` or :dfn:`minimal` fashion; as *few*
characters as possible will be matched. Using the RE ``<.*?>`` will match characters as possible will be matched. Using the RE ``<.*?>`` will match
only ``'<a>'``. only ``'<a>'``.
@ -160,11 +160,11 @@ The special characters are:
single: ?+; in regular expressions single: ?+; in regular expressions
``*+``, ``++``, ``?+`` ``*+``, ``++``, ``?+``
Like the ``'*'``, ``'+'``, and ``'?'`` qualifiers, those where ``'+'`` is Like the ``'*'``, ``'+'``, and ``'?'`` quantifiers, those where ``'+'`` is
appended also match as many times as possible. appended also match as many times as possible.
However, unlike the true greedy qualifiers, these do not allow However, unlike the true greedy quantifiers, these do not allow
back-tracking when the expression following it fails to match. back-tracking when the expression following it fails to match.
These are known as :dfn:`possessive` qualifiers. These are known as :dfn:`possessive` quantifiers.
For example, ``a*a`` will match ``'aaaa'`` because the ``a*`` will match For example, ``a*a`` will match ``'aaaa'`` because the ``a*`` will match
all 4 ``'a'``s, but, when the final ``'a'`` is encountered, the all 4 ``'a'``s, but, when the final ``'a'`` is encountered, the
expression is backtracked so that in the end the ``a*`` ends up matching expression is backtracked so that in the end the ``a*`` ends up matching
@ -198,7 +198,7 @@ The special characters are:
``{m,n}?`` ``{m,n}?``
Causes the resulting RE to match from *m* to *n* repetitions of the preceding Causes the resulting RE to match from *m* to *n* repetitions of the preceding
RE, attempting to match as *few* repetitions as possible. This is the RE, attempting to match as *few* repetitions as possible. This is the
non-greedy version of the previous qualifier. For example, on the non-greedy version of the previous quantifier. For example, on the
6-character string ``'aaaaaa'``, ``a{3,5}`` will match 5 ``'a'`` characters, 6-character string ``'aaaaaa'``, ``a{3,5}`` will match 5 ``'a'`` characters,
while ``a{3,5}?`` will only match 3 characters. while ``a{3,5}?`` will only match 3 characters.
@ -206,7 +206,7 @@ The special characters are:
Causes the resulting RE to match from *m* to *n* repetitions of the Causes the resulting RE to match from *m* to *n* repetitions of the
preceding RE, attempting to match as many repetitions as possible preceding RE, attempting to match as many repetitions as possible
*without* establishing any backtracking points. *without* establishing any backtracking points.
This is the possessive version of the qualifier above. This is the possessive version of the quantifier above.
For example, on the 6-character string ``'aaaaaa'``, ``a{3,5}+aa`` For example, on the 6-character string ``'aaaaaa'``, ``a{3,5}+aa``
attempt to match 5 ``'a'`` characters, then, requiring 2 more ``'a'``s, attempt to match 5 ``'a'`` characters, then, requiring 2 more ``'a'``s,
will need more characters than available and thus fail, while will need more characters than available and thus fail, while

View file

@ -298,7 +298,7 @@ os
re re
-- --
* Atomic grouping (``(?>...)``) and possessive qualifiers (``*+``, ``++``, * Atomic grouping (``(?>...)``) and possessive quantifiers (``*+``, ``++``,
``?+``, ``{m,n}+``) are now supported in regular expressions. ``?+``, ``{m,n}+``) are now supported in regular expressions.
(Contributed by Jeffrey C. Jacobs and Serhiy Storchaka in :issue:`433030`.) (Contributed by Jeffrey C. Jacobs and Serhiy Storchaka in :issue:`433030`.)

View file

@ -2038,9 +2038,9 @@ class ReTests(unittest.TestCase):
with self.assertRaisesRegex(TypeError, "got 'type'"): with self.assertRaisesRegex(TypeError, "got 'type'"):
re.search("x*", type) re.search("x*", type)
def test_possessive_qualifiers(self): def test_possessive_quantifiers(self):
"""Test Possessive Qualifiers """Test Possessive Quantifiers
Test qualifiers of the form @+ for some repetition operator @, Test quantifiers of the form @+ for some repetition operator @,
e.g. x{3,5}+ meaning match from 3 to 5 greadily and proceed e.g. x{3,5}+ meaning match from 3 to 5 greadily and proceed
without creating a stack frame for rolling the stack back and without creating a stack frame for rolling the stack back and
trying 1 or more fewer matches.""" trying 1 or more fewer matches."""
@ -2077,7 +2077,7 @@ class ReTests(unittest.TestCase):
self.assertIsNone(re.match("^x{}+$", "xxx")) self.assertIsNone(re.match("^x{}+$", "xxx"))
self.assertTrue(re.match("^x{}+$", "x{}")) self.assertTrue(re.match("^x{}+$", "x{}"))
def test_fullmatch_possessive_qualifiers(self): def test_fullmatch_possessive_quantifiers(self):
self.assertTrue(re.fullmatch(r'a++', 'a')) self.assertTrue(re.fullmatch(r'a++', 'a'))
self.assertTrue(re.fullmatch(r'a*+', 'a')) self.assertTrue(re.fullmatch(r'a*+', 'a'))
self.assertTrue(re.fullmatch(r'a?+', 'a')) self.assertTrue(re.fullmatch(r'a?+', 'a'))
@ -2096,7 +2096,7 @@ class ReTests(unittest.TestCase):
self.assertIsNone(re.fullmatch(r'(?:ab)?+', 'abc')) self.assertIsNone(re.fullmatch(r'(?:ab)?+', 'abc'))
self.assertIsNone(re.fullmatch(r'(?:ab){1,3}+', 'abc')) self.assertIsNone(re.fullmatch(r'(?:ab){1,3}+', 'abc'))
def test_findall_possessive_qualifiers(self): def test_findall_possessive_quantifiers(self):
self.assertEqual(re.findall(r'a++', 'aab'), ['aa']) self.assertEqual(re.findall(r'a++', 'aab'), ['aa'])
self.assertEqual(re.findall(r'a*+', 'aab'), ['aa', '', '']) self.assertEqual(re.findall(r'a*+', 'aab'), ['aa', '', ''])
self.assertEqual(re.findall(r'a?+', 'aab'), ['a', 'a', '', '']) self.assertEqual(re.findall(r'a?+', 'aab'), ['a', 'a', '', ''])

View file

@ -1,2 +1,2 @@
Add support of atomic grouping (``(?>...)``) and possessive qualifiers Add support of atomic grouping (``(?>...)``) and possessive quantifiers
(``*+``, ``++``, ``?+``, ``{m,n}+``) in :mod:`regular expressions <re>`. (``*+``, ``++``, ``?+``, ``{m,n}+``) in :mod:`regular expressions <re>`.