Merged revisions 79430 via svnmerge from

svn+ssh://pythondev@svn.python.org/python/trunk

........
  r79430 | brian.curtin | 2010-03-25 18:48:54 -0500 (Thu, 25 Mar 2010) | 2 lines

  Fix #6538. Markup RegexObject and MatchObject as classes. Patch by Ryan Arana.
........
This commit is contained in:
Brian Curtin 2010-03-26 00:39:56 +00:00
parent fa0aebacd9
commit 027e478f3f

View file

@ -705,98 +705,99 @@ form.
Regular Expression Objects Regular Expression Objects
-------------------------- --------------------------
Compiled regular expression objects support the following methods and .. class:: RegexObject
attributes:
The :class:`RegexObject` class supports the following methods and attributes:
.. method:: RegexObject.match(string[, pos[, endpos]]) .. method:: RegexObject.match(string[, pos[, endpos]])
If zero or more characters at the beginning of *string* match this regular If zero or more characters at the beginning of *string* match this regular
expression, return a corresponding :class:`MatchObject` instance. Return expression, return a corresponding :class:`MatchObject` instance. Return
``None`` if the string does not match the pattern; note that this is different ``None`` if the string does not match the pattern; note that this is different
from a zero-length match. from a zero-length match.
.. note:: .. note::
If you want to locate a match anywhere in *string*, use If you want to locate a match anywhere in *string*, use
:meth:`~RegexObject.search` instead. :meth:`~RegexObject.search` instead.
The optional second parameter *pos* gives an index in the string where the The optional second parameter *pos* gives an index in the string where the
search is to start; it defaults to ``0``. This is not completely equivalent to search is to start; it defaults to ``0``. This is not completely equivalent to
slicing the string; the ``'^'`` pattern character matches at the real beginning slicing the string; the ``'^'`` pattern character matches at the real beginning
of the string and at positions just after a newline, but not necessarily at the of the string and at positions just after a newline, but not necessarily at the
index where the search is to start. index where the search is to start.
The optional parameter *endpos* limits how far the string will be searched; it The optional parameter *endpos* limits how far the string will be searched; it
will be as if the string is *endpos* characters long, so only the characters will be as if the string is *endpos* characters long, so only the characters
from *pos* to ``endpos - 1`` will be searched for a match. If *endpos* is less from *pos* to ``endpos - 1`` will be searched for a match. If *endpos* is less
than *pos*, no match will be found, otherwise, if *rx* is a compiled regular than *pos*, no match will be found, otherwise, if *rx* is a compiled regular
expression object, ``rx.match(string, 0, 50)`` is equivalent to expression object, ``rx.match(string, 0, 50)`` is equivalent to
``rx.match(string[:50], 0)``. ``rx.match(string[:50], 0)``.
>>> pattern = re.compile("o") >>> pattern = re.compile("o")
>>> pattern.match("dog") # No match as "o" is not at the start of "dog." >>> pattern.match("dog") # No match as "o" is not at the start of "dog."
>>> pattern.match("dog", 1) # Match as "o" is the 2nd character of "dog". >>> pattern.match("dog", 1) # Match as "o" is the 2nd character of "dog".
<_sre.SRE_Match object at ...> <_sre.SRE_Match object at ...>
.. method:: RegexObject.search(string[, pos[, endpos]]) .. method:: RegexObject.search(string[, pos[, endpos]])
Scan through *string* looking for a location where this regular expression Scan through *string* looking for a location where this regular expression
produces a match, and return a corresponding :class:`MatchObject` instance. produces a match, and return a corresponding :class:`MatchObject` instance.
Return ``None`` if no position in the string matches the pattern; note that this Return ``None`` if no position in the string matches the pattern; note that this
is different from finding a zero-length match at some point in the string. is different from finding a zero-length match at some point in the string.
The optional *pos* and *endpos* parameters have the same meaning as for the The optional *pos* and *endpos* parameters have the same meaning as for the
:meth:`~RegexObject.match` method. :meth:`~RegexObject.match` method.
.. method:: RegexObject.split(string, maxsplit=0) .. method:: RegexObject.split(string[, maxsplit=0])
Identical to the :func:`split` function, using the compiled pattern. Identical to the :func:`split` function, using the compiled pattern.
.. method:: RegexObject.findall(string[, pos[, endpos]]) .. method:: RegexObject.findall(string[, pos[, endpos]])
Identical to the :func:`findall` function, using the compiled pattern. Identical to the :func:`findall` function, using the compiled pattern.
.. method:: RegexObject.finditer(string[, pos[, endpos]]) .. method:: RegexObject.finditer(string[, pos[, endpos]])
Identical to the :func:`finditer` function, using the compiled pattern. Identical to the :func:`finditer` function, using the compiled pattern.
.. method:: RegexObject.sub(repl, string, count=0) .. method:: RegexObject.sub(repl, string[, count=0])
Identical to the :func:`sub` function, using the compiled pattern. Identical to the :func:`sub` function, using the compiled pattern.
.. method:: RegexObject.subn(repl, string, count=0) .. method:: RegexObject.subn(repl, string[, count=0])
Identical to the :func:`subn` function, using the compiled pattern. Identical to the :func:`subn` function, using the compiled pattern.
.. attribute:: RegexObject.flags .. attribute:: RegexObject.flags
The flags argument used when the RE object was compiled, or ``0`` if no flags The flags argument used when the RE object was compiled, or ``0`` if no flags
were provided. were provided.
.. attribute:: RegexObject.groups .. attribute:: RegexObject.groups
The number of capturing groups in the pattern. The number of capturing groups in the pattern.
.. attribute:: RegexObject.groupindex .. attribute:: RegexObject.groupindex
A dictionary mapping any symbolic group names defined by ``(?P<id>)`` to group A dictionary mapping any symbolic group names defined by ``(?P<id>)`` to group
numbers. The dictionary is empty if no symbolic groups were used in the numbers. The dictionary is empty if no symbolic groups were used in the
pattern. pattern.
.. attribute:: RegexObject.pattern .. attribute:: RegexObject.pattern
The pattern string from which the RE object was compiled. The pattern string from which the RE object was compiled.
.. _match-objects: .. _match-objects:
@ -804,176 +805,178 @@ attributes:
Match Objects Match Objects
------------- -------------
Match objects always have a boolean value of :const:`True`, so that you can test .. class:: MatchObject
whether e.g. :func:`match` resulted in a match with a simple if statement. They
support the following methods and attributes: Match Objects always have a boolean value of :const:`True`, so that you can test
whether e.g. :func:`match` resulted in a match with a simple if statement. They
support the following methods and attributes:
.. method:: MatchObject.expand(template) .. method:: MatchObject.expand(template)
Return the string obtained by doing backslash substitution on the template Return the string obtained by doing backslash substitution on the template
string *template*, as done by the :meth:`~RegexObject.sub` method. Escapes string *template*, as done by the :meth:`~RegexObject.sub` method. Escapes
such as ``\n`` are converted to the appropriate characters, and numeric such as ``\n`` are converted to the appropriate characters, and numeric
backreferences (``\1``, ``\2``) and named backreferences (``\g<1>``, backreferences (``\1``, ``\2``) and named backreferences (``\g<1>``,
``\g<name>``) are replaced by the contents of the corresponding group. ``\g<name>``) are replaced by the contents of the corresponding group.
.. method:: MatchObject.group([group1, ...]) .. method:: MatchObject.group([group1, ...])
Returns one or more subgroups of the match. If there is a single argument, the Returns one or more subgroups of the match. If there is a single argument, the
result is a single string; if there are multiple arguments, the result is a result is a single string; if there are multiple arguments, the result is a
tuple with one item per argument. Without arguments, *group1* defaults to zero tuple with one item per argument. Without arguments, *group1* defaults to zero
(the whole match is returned). If a *groupN* argument is zero, the corresponding (the whole match is returned). If a *groupN* argument is zero, the corresponding
return value is the entire matching string; if it is in the inclusive range return value is the entire matching string; if it is in the inclusive range
[1..99], it is the string matching the corresponding parenthesized group. If a [1..99], it is the string matching the corresponding parenthesized group. If a
group number is negative or larger than the number of groups defined in the group number is negative or larger than the number of groups defined in the
pattern, an :exc:`IndexError` exception is raised. If a group is contained in a pattern, an :exc:`IndexError` exception is raised. If a group is contained in a
part of the pattern that did not match, the corresponding result is ``None``. part of the pattern that did not match, the corresponding result is ``None``.
If a group is contained in a part of the pattern that matched multiple times, If a group is contained in a part of the pattern that matched multiple times,
the last match is returned. the last match is returned.
>>> m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist") >>> m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist")
>>> m.group(0) # The entire match >>> m.group(0) # The entire match
'Isaac Newton' 'Isaac Newton'
>>> m.group(1) # The first parenthesized subgroup. >>> m.group(1) # The first parenthesized subgroup.
'Isaac' 'Isaac'
>>> m.group(2) # The second parenthesized subgroup. >>> m.group(2) # The second parenthesized subgroup.
'Newton' 'Newton'
>>> m.group(1, 2) # Multiple arguments give us a tuple. >>> m.group(1, 2) # Multiple arguments give us a tuple.
('Isaac', 'Newton') ('Isaac', 'Newton')
If the regular expression uses the ``(?P<name>...)`` syntax, the *groupN* If the regular expression uses the ``(?P<name>...)`` syntax, the *groupN*
arguments may also be strings identifying groups by their group name. If a arguments may also be strings identifying groups by their group name. If a
string argument is not used as a group name in the pattern, an :exc:`IndexError` string argument is not used as a group name in the pattern, an :exc:`IndexError`
exception is raised. exception is raised.
A moderately complicated example: A moderately complicated example:
>>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds") >>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
>>> m.group('first_name') >>> m.group('first_name')
'Malcolm' 'Malcolm'
>>> m.group('last_name') >>> m.group('last_name')
'Reynolds' 'Reynolds'
Named groups can also be referred to by their index: Named groups can also be referred to by their index:
>>> m.group(1) >>> m.group(1)
'Malcolm' 'Malcolm'
>>> m.group(2) >>> m.group(2)
'Reynolds' 'Reynolds'
If a group matches multiple times, only the last match is accessible: If a group matches multiple times, only the last match is accessible:
>>> m = re.match(r"(..)+", "a1b2c3") # Matches 3 times. >>> m = re.match(r"(..)+", "a1b2c3") # Matches 3 times.
>>> m.group(1) # Returns only the last match. >>> m.group(1) # Returns only the last match.
'c3' 'c3'
Return a tuple containing all the subgroups of the match, from 1 up to however
many groups are in the pattern. The *default* argument is used for groups that
did not participate in the match; it defaults to ``None``. (Incompatibility
note: in the original Python 1.5 release, if the tuple was one element long, a
string would be returned instead. In later versions (from 1.5.1 on), a
singleton tuple is returned in such cases.)
For example:
>>> m = re.match(r"(\d+)\.(\d+)", "24.1632")
>>> m.groups()
('24', '1632')
If we make the decimal place and everything after it optional, not all groups
might participate in the match. These groups will default to ``None`` unless
the *default* argument is given:
>>> m = re.match(r"(\d+)\.?(\d+)?", "24")
>>> m.groups() # Second group defaults to None.
('24', None)
>>> m.groups('0') # Now, the second group defaults to '0'.
('24', '0')
.. method:: MatchObject.groups(default=None) .. method:: MatchObject.groupdict([default])
Return a tuple containing all the subgroups of the match, from 1 up to however Return a dictionary containing all the *named* subgroups of the match, keyed by
many groups are in the pattern. The *default* argument is used for groups that the subgroup name. The *default* argument is used for groups that did not
did not participate in the match; it defaults to ``None``. participate in the match; it defaults to ``None``. For example:
For example: >>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
>>> m.groupdict()
>>> m = re.match(r"(\d+)\.(\d+)", "24.1632") {'first_name': 'Malcolm', 'last_name': 'Reynolds'}
>>> m.groups()
('24', '1632')
If we make the decimal place and everything after it optional, not all groups
might participate in the match. These groups will default to ``None`` unless
the *default* argument is given:
>>> m = re.match(r"(\d+)\.?(\d+)?", "24")
>>> m.groups() # Second group defaults to None.
('24', None)
>>> m.groups('0') # Now, the second group defaults to '0'.
('24', '0')
.. method:: MatchObject.groupdict(default=None) .. method:: MatchObject.start([group])
MatchObject.end([group])
Return a dictionary containing all the *named* subgroups of the match, keyed by Return the indices of the start and end of the substring matched by *group*;
the subgroup name. The *default* argument is used for groups that did not *group* defaults to zero (meaning the whole matched substring). Return ``-1`` if
participate in the match; it defaults to ``None``. For example: *group* exists but did not contribute to the match. For a match object *m*, and
a group *g* that did contribute to the match, the substring matched by group *g*
(equivalent to ``m.group(g)``) is ::
>>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds") m.string[m.start(g):m.end(g)]
>>> m.groupdict()
{'first_name': 'Malcolm', 'last_name': 'Reynolds'} Note that ``m.start(group)`` will equal ``m.end(group)`` if *group* matched a
null string. For example, after ``m = re.search('b(c?)', 'cba')``,
``m.start(0)`` is 1, ``m.end(0)`` is 2, ``m.start(1)`` and ``m.end(1)`` are both
2, and ``m.start(2)`` raises an :exc:`IndexError` exception.
An example that will remove *remove_this* from email addresses:
>>> email = "tony@tiremove_thisger.net"
>>> m = re.search("remove_this", email)
>>> email[:m.start()] + email[m.end():]
'tony@tiger.net'
.. method:: MatchObject.start(group=0) .. method:: MatchObject.span([group])
MatchObject.end(group=0)
Return the indices of the start and end of the substring matched by *group*; For :class:`MatchObject` *m*, return the 2-tuple ``(m.start(group),
*group* defaults to zero (meaning the whole matched substring). Return ``-1`` if m.end(group))``. Note that if *group* did not contribute to the match, this is
*group* exists but did not contribute to the match. For a match object *m*, and ``(-1, -1)``. *group* defaults to zero, the entire match.
a group *g* that did contribute to the match, the substring matched by group *g*
(equivalent to ``m.group(g)``) is ::
m.string[m.start(g):m.end(g)]
Note that ``m.start(group)`` will equal ``m.end(group)`` if *group* matched a
null string. For example, after ``m = re.search('b(c?)', 'cba')``,
``m.start(0)`` is 1, ``m.end(0)`` is 2, ``m.start(1)`` and ``m.end(1)`` are both
2, and ``m.start(2)`` raises an :exc:`IndexError` exception.
An example that will remove *remove_this* from email addresses:
>>> email = "tony@tiremove_thisger.net"
>>> m = re.search("remove_this", email)
>>> email[:m.start()] + email[m.end():]
'tony@tiger.net'
.. method:: MatchObject.span(group=0) .. attribute:: MatchObject.pos
For :class:`MatchObject` *m*, return the 2-tuple ``(m.start(group), The value of *pos* which was passed to the :meth:`~RegexObject.search` or
m.end(group))``. Note that if *group* did not contribute to the match, this is :meth:`~RegexObject.match` method of the :class:`RegexObject`. This is the
``(-1, -1)``. *group* defaults to zero, the entire match. index into the string at which the RE engine started looking for a match.
.. attribute:: MatchObject.pos .. attribute:: MatchObject.endpos
The value of *pos* which was passed to the :meth:`~RegexObject.search` or The value of *endpos* which was passed to the :meth:`~RegexObject.search` or
:meth:`~RegexObject.match` method of the :class:`RegexObject`. This is the :meth:`~RegexObject.match` method of the :class:`RegexObject`. This is the
index into the string at which the RE engine started looking for a match. index into the string beyond which the RE engine will not go.
.. attribute:: MatchObject.endpos .. attribute:: MatchObject.lastindex
The value of *endpos* which was passed to the :meth:`~RegexObject.search` or The integer index of the last matched capturing group, or ``None`` if no group
:meth:`~RegexObject.match` method of the :class:`RegexObject`. This is the was matched at all. For example, the expressions ``(a)b``, ``((a)(b))``, and
index into the string beyond which the RE engine will not go. ``((ab))`` will have ``lastindex == 1`` if applied to the string ``'ab'``, while
the expression ``(a)(b)`` will have ``lastindex == 2``, if applied to the same
string.
.. attribute:: MatchObject.lastindex .. attribute:: MatchObject.lastgroup
The integer index of the last matched capturing group, or ``None`` if no group The name of the last matched capturing group, or ``None`` if the group didn't
was matched at all. For example, the expressions ``(a)b``, ``((a)(b))``, and have a name, or if no group was matched at all.
``((ab))`` will have ``lastindex == 1`` if applied to the string ``'ab'``, while
the expression ``(a)(b)`` will have ``lastindex == 2``, if applied to the same
string.
.. attribute:: MatchObject.lastgroup .. attribute:: MatchObject.re
The name of the last matched capturing group, or ``None`` if the group didn't The regular expression object whose :meth:`~RegexObject.match` or
have a name, or if no group was matched at all. :meth:`~RegexObject.search` method produced this :class:`MatchObject`
instance.
.. attribute:: MatchObject.re .. attribute:: MatchObject.string
The regular expression object whose :meth:`~RegexObject.match` or The string passed to :meth:`~RegexObject.match` or
:meth:`~RegexObject.search` method produced this :class:`MatchObject` :meth:`~RegexObject.search`.
instance.
.. attribute:: MatchObject.string
The string passed to :meth:`~RegexObject.match` or
:meth:`~RegexObject.search`.
Examples Examples