mirror of
https://github.com/python/cpython.git
synced 2025-09-26 10:19:53 +00:00
bpo-47081: Replace "qualifiers" with "quantifiers" in the re module documentation (GH-32028)
It is a more commonly used term.
This commit is contained in:
parent
4f97d64c83
commit
c6cd3cc93c
5 changed files with 21 additions and 21 deletions
|
@ -230,13 +230,13 @@ while ``+`` requires at least *one* occurrence. To use a similar example,
|
||||||
``ca+t`` will match ``'cat'`` (1 ``'a'``), ``'caaat'`` (3 ``'a'``\ s), but won't
|
``ca+t`` will match ``'cat'`` (1 ``'a'``), ``'caaat'`` (3 ``'a'``\ s), but won't
|
||||||
match ``'ct'``.
|
match ``'ct'``.
|
||||||
|
|
||||||
There are two more repeating qualifiers. The question mark character, ``?``,
|
There are two more repeating operators or quantifiers. The question mark character, ``?``,
|
||||||
matches either once or zero times; you can think of it as marking something as
|
matches either once or zero times; you can think of it as marking something as
|
||||||
being optional. For example, ``home-?brew`` matches either ``'homebrew'`` or
|
being optional. For example, ``home-?brew`` matches either ``'homebrew'`` or
|
||||||
``'home-brew'``.
|
``'home-brew'``.
|
||||||
|
|
||||||
The most complicated repeated qualifier is ``{m,n}``, where *m* and *n* are
|
The most complicated quantifier is ``{m,n}``, where *m* and *n* are
|
||||||
decimal integers. This qualifier means there must be at least *m* repetitions,
|
decimal integers. This quantifier means there must be at least *m* repetitions,
|
||||||
and at most *n*. For example, ``a/{1,3}b`` will match ``'a/b'``, ``'a//b'``, and
|
and at most *n*. For example, ``a/{1,3}b`` will match ``'a/b'``, ``'a//b'``, and
|
||||||
``'a///b'``. It won't match ``'ab'``, which has no slashes, or ``'a////b'``, which
|
``'a///b'``. It won't match ``'ab'``, which has no slashes, or ``'a////b'``, which
|
||||||
has four.
|
has four.
|
||||||
|
@ -245,7 +245,7 @@ You can omit either *m* or *n*; in that case, a reasonable value is assumed for
|
||||||
the missing value. Omitting *m* is interpreted as a lower limit of 0, while
|
the missing value. Omitting *m* is interpreted as a lower limit of 0, while
|
||||||
omitting *n* results in an upper bound of infinity.
|
omitting *n* results in an upper bound of infinity.
|
||||||
|
|
||||||
Readers of a reductionist bent may notice that the three other qualifiers can
|
Readers of a reductionist bent may notice that the three other quantifiers can
|
||||||
all be expressed using this notation. ``{0,}`` is the same as ``*``, ``{1,}``
|
all be expressed using this notation. ``{0,}`` is the same as ``*``, ``{1,}``
|
||||||
is equivalent to ``+``, and ``{0,1}`` is the same as ``?``. It's better to use
|
is equivalent to ``+``, and ``{0,1}`` is the same as ``?``. It's better to use
|
||||||
``*``, ``+``, or ``?`` when you can, simply because they're shorter and easier
|
``*``, ``+``, or ``?`` when you can, simply because they're shorter and easier
|
||||||
|
@ -803,7 +803,7 @@ which matches the header's value.
|
||||||
Groups are marked by the ``'('``, ``')'`` metacharacters. ``'('`` and ``')'``
|
Groups are marked by the ``'('``, ``')'`` metacharacters. ``'('`` and ``')'``
|
||||||
have much the same meaning as they do in mathematical expressions; they group
|
have much the same meaning as they do in mathematical expressions; they group
|
||||||
together the expressions contained inside them, and you can repeat the contents
|
together the expressions contained inside them, and you can repeat the contents
|
||||||
of a group with a repeating qualifier, such as ``*``, ``+``, ``?``, or
|
of a group with a quantifier, such as ``*``, ``+``, ``?``, or
|
||||||
``{m,n}``. For example, ``(ab)*`` will match zero or more repetitions of
|
``{m,n}``. For example, ``(ab)*`` will match zero or more repetitions of
|
||||||
``ab``. ::
|
``ab``. ::
|
||||||
|
|
||||||
|
@ -1326,7 +1326,7 @@ backtrack character by character until it finds a match for the ``>``. The
|
||||||
final match extends from the ``'<'`` in ``'<html>'`` to the ``'>'`` in
|
final match extends from the ``'<'`` in ``'<html>'`` to the ``'>'`` in
|
||||||
``'</title>'``, which isn't what you want.
|
``'</title>'``, which isn't what you want.
|
||||||
|
|
||||||
In this case, the solution is to use the non-greedy qualifiers ``*?``, ``+?``,
|
In this case, the solution is to use the non-greedy quantifiers ``*?``, ``+?``,
|
||||||
``??``, or ``{m,n}?``, which match as *little* text as possible. In the above
|
``??``, or ``{m,n}?``, which match as *little* text as possible. In the above
|
||||||
example, the ``'>'`` is tried immediately after the first ``'<'`` matches, and
|
example, the ``'>'`` is tried immediately after the first ``'<'`` matches, and
|
||||||
when it fails, the engine advances a character at a time, retrying the ``'>'``
|
when it fails, the engine advances a character at a time, retrying the ``'>'``
|
||||||
|
|
|
@ -87,7 +87,7 @@ Some characters, like ``'|'`` or ``'('``, are special. Special
|
||||||
characters either stand for classes of ordinary characters, or affect
|
characters either stand for classes of ordinary characters, or affect
|
||||||
how the regular expressions around them are interpreted.
|
how the regular expressions around them are interpreted.
|
||||||
|
|
||||||
Repetition qualifiers (``*``, ``+``, ``?``, ``{m,n}``, etc) cannot be
|
Repetition operators or quantifiers (``*``, ``+``, ``?``, ``{m,n}``, etc) cannot be
|
||||||
directly nested. This avoids ambiguity with the non-greedy modifier suffix
|
directly nested. This avoids ambiguity with the non-greedy modifier suffix
|
||||||
``?``, and with other modifiers in other implementations. To apply a second
|
``?``, and with other modifiers in other implementations. To apply a second
|
||||||
repetition to an inner repetition, parentheses may be used. For example,
|
repetition to an inner repetition, parentheses may be used. For example,
|
||||||
|
@ -146,10 +146,10 @@ The special characters are:
|
||||||
single: ??; in regular expressions
|
single: ??; in regular expressions
|
||||||
|
|
||||||
``*?``, ``+?``, ``??``
|
``*?``, ``+?``, ``??``
|
||||||
The ``'*'``, ``'+'``, and ``'?'`` qualifiers are all :dfn:`greedy`; they match
|
The ``'*'``, ``'+'``, and ``'?'`` quantifiers are all :dfn:`greedy`; they match
|
||||||
as much text as possible. Sometimes this behaviour isn't desired; if the RE
|
as much text as possible. Sometimes this behaviour isn't desired; if the RE
|
||||||
``<.*>`` is matched against ``'<a> b <c>'``, it will match the entire
|
``<.*>`` is matched against ``'<a> b <c>'``, it will match the entire
|
||||||
string, and not just ``'<a>'``. Adding ``?`` after the qualifier makes it
|
string, and not just ``'<a>'``. Adding ``?`` after the quantifier makes it
|
||||||
perform the match in :dfn:`non-greedy` or :dfn:`minimal` fashion; as *few*
|
perform the match in :dfn:`non-greedy` or :dfn:`minimal` fashion; as *few*
|
||||||
characters as possible will be matched. Using the RE ``<.*?>`` will match
|
characters as possible will be matched. Using the RE ``<.*?>`` will match
|
||||||
only ``'<a>'``.
|
only ``'<a>'``.
|
||||||
|
@ -160,11 +160,11 @@ The special characters are:
|
||||||
single: ?+; in regular expressions
|
single: ?+; in regular expressions
|
||||||
|
|
||||||
``*+``, ``++``, ``?+``
|
``*+``, ``++``, ``?+``
|
||||||
Like the ``'*'``, ``'+'``, and ``'?'`` qualifiers, those where ``'+'`` is
|
Like the ``'*'``, ``'+'``, and ``'?'`` quantifiers, those where ``'+'`` is
|
||||||
appended also match as many times as possible.
|
appended also match as many times as possible.
|
||||||
However, unlike the true greedy qualifiers, these do not allow
|
However, unlike the true greedy quantifiers, these do not allow
|
||||||
back-tracking when the expression following it fails to match.
|
back-tracking when the expression following it fails to match.
|
||||||
These are known as :dfn:`possessive` qualifiers.
|
These are known as :dfn:`possessive` quantifiers.
|
||||||
For example, ``a*a`` will match ``'aaaa'`` because the ``a*`` will match
|
For example, ``a*a`` will match ``'aaaa'`` because the ``a*`` will match
|
||||||
all 4 ``'a'``s, but, when the final ``'a'`` is encountered, the
|
all 4 ``'a'``s, but, when the final ``'a'`` is encountered, the
|
||||||
expression is backtracked so that in the end the ``a*`` ends up matching
|
expression is backtracked so that in the end the ``a*`` ends up matching
|
||||||
|
@ -198,7 +198,7 @@ The special characters are:
|
||||||
``{m,n}?``
|
``{m,n}?``
|
||||||
Causes the resulting RE to match from *m* to *n* repetitions of the preceding
|
Causes the resulting RE to match from *m* to *n* repetitions of the preceding
|
||||||
RE, attempting to match as *few* repetitions as possible. This is the
|
RE, attempting to match as *few* repetitions as possible. This is the
|
||||||
non-greedy version of the previous qualifier. For example, on the
|
non-greedy version of the previous quantifier. For example, on the
|
||||||
6-character string ``'aaaaaa'``, ``a{3,5}`` will match 5 ``'a'`` characters,
|
6-character string ``'aaaaaa'``, ``a{3,5}`` will match 5 ``'a'`` characters,
|
||||||
while ``a{3,5}?`` will only match 3 characters.
|
while ``a{3,5}?`` will only match 3 characters.
|
||||||
|
|
||||||
|
@ -206,7 +206,7 @@ The special characters are:
|
||||||
Causes the resulting RE to match from *m* to *n* repetitions of the
|
Causes the resulting RE to match from *m* to *n* repetitions of the
|
||||||
preceding RE, attempting to match as many repetitions as possible
|
preceding RE, attempting to match as many repetitions as possible
|
||||||
*without* establishing any backtracking points.
|
*without* establishing any backtracking points.
|
||||||
This is the possessive version of the qualifier above.
|
This is the possessive version of the quantifier above.
|
||||||
For example, on the 6-character string ``'aaaaaa'``, ``a{3,5}+aa``
|
For example, on the 6-character string ``'aaaaaa'``, ``a{3,5}+aa``
|
||||||
attempt to match 5 ``'a'`` characters, then, requiring 2 more ``'a'``s,
|
attempt to match 5 ``'a'`` characters, then, requiring 2 more ``'a'``s,
|
||||||
will need more characters than available and thus fail, while
|
will need more characters than available and thus fail, while
|
||||||
|
|
|
@ -298,7 +298,7 @@ os
|
||||||
re
|
re
|
||||||
--
|
--
|
||||||
|
|
||||||
* Atomic grouping (``(?>...)``) and possessive qualifiers (``*+``, ``++``,
|
* Atomic grouping (``(?>...)``) and possessive quantifiers (``*+``, ``++``,
|
||||||
``?+``, ``{m,n}+``) are now supported in regular expressions.
|
``?+``, ``{m,n}+``) are now supported in regular expressions.
|
||||||
(Contributed by Jeffrey C. Jacobs and Serhiy Storchaka in :issue:`433030`.)
|
(Contributed by Jeffrey C. Jacobs and Serhiy Storchaka in :issue:`433030`.)
|
||||||
|
|
||||||
|
|
|
@ -2038,9 +2038,9 @@ class ReTests(unittest.TestCase):
|
||||||
with self.assertRaisesRegex(TypeError, "got 'type'"):
|
with self.assertRaisesRegex(TypeError, "got 'type'"):
|
||||||
re.search("x*", type)
|
re.search("x*", type)
|
||||||
|
|
||||||
def test_possessive_qualifiers(self):
|
def test_possessive_quantifiers(self):
|
||||||
"""Test Possessive Qualifiers
|
"""Test Possessive Quantifiers
|
||||||
Test qualifiers of the form @+ for some repetition operator @,
|
Test quantifiers of the form @+ for some repetition operator @,
|
||||||
e.g. x{3,5}+ meaning match from 3 to 5 greadily and proceed
|
e.g. x{3,5}+ meaning match from 3 to 5 greadily and proceed
|
||||||
without creating a stack frame for rolling the stack back and
|
without creating a stack frame for rolling the stack back and
|
||||||
trying 1 or more fewer matches."""
|
trying 1 or more fewer matches."""
|
||||||
|
@ -2077,7 +2077,7 @@ class ReTests(unittest.TestCase):
|
||||||
self.assertIsNone(re.match("^x{}+$", "xxx"))
|
self.assertIsNone(re.match("^x{}+$", "xxx"))
|
||||||
self.assertTrue(re.match("^x{}+$", "x{}"))
|
self.assertTrue(re.match("^x{}+$", "x{}"))
|
||||||
|
|
||||||
def test_fullmatch_possessive_qualifiers(self):
|
def test_fullmatch_possessive_quantifiers(self):
|
||||||
self.assertTrue(re.fullmatch(r'a++', 'a'))
|
self.assertTrue(re.fullmatch(r'a++', 'a'))
|
||||||
self.assertTrue(re.fullmatch(r'a*+', 'a'))
|
self.assertTrue(re.fullmatch(r'a*+', 'a'))
|
||||||
self.assertTrue(re.fullmatch(r'a?+', 'a'))
|
self.assertTrue(re.fullmatch(r'a?+', 'a'))
|
||||||
|
@ -2096,7 +2096,7 @@ class ReTests(unittest.TestCase):
|
||||||
self.assertIsNone(re.fullmatch(r'(?:ab)?+', 'abc'))
|
self.assertIsNone(re.fullmatch(r'(?:ab)?+', 'abc'))
|
||||||
self.assertIsNone(re.fullmatch(r'(?:ab){1,3}+', 'abc'))
|
self.assertIsNone(re.fullmatch(r'(?:ab){1,3}+', 'abc'))
|
||||||
|
|
||||||
def test_findall_possessive_qualifiers(self):
|
def test_findall_possessive_quantifiers(self):
|
||||||
self.assertEqual(re.findall(r'a++', 'aab'), ['aa'])
|
self.assertEqual(re.findall(r'a++', 'aab'), ['aa'])
|
||||||
self.assertEqual(re.findall(r'a*+', 'aab'), ['aa', '', ''])
|
self.assertEqual(re.findall(r'a*+', 'aab'), ['aa', '', ''])
|
||||||
self.assertEqual(re.findall(r'a?+', 'aab'), ['a', 'a', '', ''])
|
self.assertEqual(re.findall(r'a?+', 'aab'), ['a', 'a', '', ''])
|
||||||
|
|
|
@ -1,2 +1,2 @@
|
||||||
Add support of atomic grouping (``(?>...)``) and possessive qualifiers
|
Add support of atomic grouping (``(?>...)``) and possessive quantifiers
|
||||||
(``*+``, ``++``, ``?+``, ``{m,n}+``) in :mod:`regular expressions <re>`.
|
(``*+``, ``++``, ``?+``, ``{m,n}+``) in :mod:`regular expressions <re>`.
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue