Improved error msg when a symbolic group name is redefined. Added docs

and NEWS.  Bugfix candidate?  That's a dilemma for Anthony <wink>:  /F
did fix a longstanding bug here, but the fix can cause code to raise an
exception that previously worked by accident.
This commit is contained in:
Tim Peters 2001-11-03 19:35:43 +00:00
parent c034b47ef3
commit 7533587d43
3 changed files with 28 additions and 19 deletions

View file

@ -24,7 +24,7 @@ usage of the same character for the same purpose in string literals;
for example, to match a literal backslash, one might have to write for example, to match a literal backslash, one might have to write
\code{'\e\e\e\e'} as the pattern string, because the regular expression \code{'\e\e\e\e'} as the pattern string, because the regular expression
must be \samp{\e\e}, and each backslash must be expressed as must be \samp{\e\e}, and each backslash must be expressed as
\samp{\e\e} inside a regular Python string literal. \samp{\e\e} inside a regular Python string literal.
The solution is to use Python's raw string notation for regular The solution is to use Python's raw string notation for regular
expression patterns; backslashes are not handled in any special way in expression patterns; backslashes are not handled in any special way in
@ -178,8 +178,8 @@ will match any lowercase letter, and \code{[a-zA-Z0-9]} matches any
letter or digit. Character classes such as \code{\e w} or \code{\e S} letter or digit. Character classes such as \code{\e w} or \code{\e S}
(defined below) are also acceptable inside a range. If you want to (defined below) are also acceptable inside a range. If you want to
include a \character{]} or a \character{-} inside a set, precede it with a include a \character{]} or a \character{-} inside a set, precede it with a
backslash, or place it as the first character. The backslash, or place it as the first character. The
pattern \regexp{[]]} will match \code{']'}, for example. pattern \regexp{[]]} will match \code{']'}, for example.
You can match the characters not within a range by \dfn{complementing} You can match the characters not within a range by \dfn{complementing}
the set. This is indicated by including a \character{\^} as the first the set. This is indicated by including a \character{\^} as the first
@ -209,7 +209,7 @@ inside a character class: \regexp{[(] [)]}.
\item[\code{(?...)}] This is an extension notation (a \character{?} \item[\code{(?...)}] This is an extension notation (a \character{?}
following a \character{(} is not meaningful otherwise). The first following a \character{(} is not meaningful otherwise). The first
character after the \character{?} character after the \character{?}
determines what the meaning and further syntax of the construct is. determines what the meaning and further syntax of the construct is.
Extensions usually do not create a new group; Extensions usually do not create a new group;
\regexp{(?P<\var{name}>...)} is the only exception to this rule. \regexp{(?P<\var{name}>...)} is the only exception to this rule.
@ -231,13 +231,14 @@ the flag, the results are undefined.
\item[\code{(?:...)}] A non-grouping version of regular parentheses. \item[\code{(?:...)}] A non-grouping version of regular parentheses.
Matches whatever regular expression is inside the parentheses, but the Matches whatever regular expression is inside the parentheses, but the
substring matched by the substring matched by the
group \emph{cannot} be retrieved after performing a match or group \emph{cannot} be retrieved after performing a match or
referenced later in the pattern. referenced later in the pattern.
\item[\code{(?P<\var{name}>...)}] Similar to regular parentheses, but \item[\code{(?P<\var{name}>...)}] Similar to regular parentheses, but
the substring matched by the group is accessible via the symbolic group the substring matched by the group is accessible via the symbolic group
name \var{name}. Group names must be valid Python identifiers. A name \var{name}. Group names must be valid Python identifiers, and
each group name must be defined only once within a regular expression. A
symbolic group is also a numbered group, just as if the group were not symbolic group is also a numbered group, just as if the group were not
named. So the group named 'id' in the example above can also be named. So the group named 'id' in the example above can also be
referenced as the numbered group 1. referenced as the numbered group 1.
@ -292,7 +293,7 @@ resulting RE will match the second character. For example,
\item[\code{\e \var{number}}] Matches the contents of the group of the \item[\code{\e \var{number}}] Matches the contents of the group of the
same number. Groups are numbered starting from 1. For example, same number. Groups are numbered starting from 1. For example,
\regexp{(.+) \e 1} matches \code{'the the'} or \code{'55 55'}, but not \regexp{(.+) \e 1} matches \code{'the the'} or \code{'55 55'}, but not
\code{'the end'} (note \code{'the end'} (note
the space after the group). This special sequence can only be used to the space after the group). This special sequence can only be used to
match one of the first 99 groups. If the first digit of \var{number} match one of the first 99 groups. If the first digit of \var{number}
is 0, or \var{number} is 3 octal digits long, it will not be interpreted is 0, or \var{number} is 3 octal digits long, it will not be interpreted
@ -300,7 +301,7 @@ as a group match, but as the character with octal value \var{number}.
(There is a group 0, which is the entire matched pattern, but it can't (There is a group 0, which is the entire matched pattern, but it can't
be referenced with \regexp{\e 0}; instead, use \regexp{\e g<0>}.) be referenced with \regexp{\e 0}; instead, use \regexp{\e g<0>}.)
Inside the \character{[} and \character{]} of a character class, all numeric Inside the \character{[} and \character{]} of a character class, all numeric
escapes are treated as characters. escapes are treated as characters.
\item[\code{\e A}] Matches only at the start of the string. \item[\code{\e A}] Matches only at the start of the string.
@ -387,7 +388,7 @@ The module defines the following functions and constants, and an exception:
\begin{funcdesc}{compile}{pattern\optional{, flags}} \begin{funcdesc}{compile}{pattern\optional{, flags}}
Compile a regular expression pattern into a regular expression Compile a regular expression pattern into a regular expression
object, which can be used for matching using its \function{match()} and object, which can be used for matching using its \function{match()} and
\function{search()} methods, described below. \function{search()} methods, described below.
The expression's behaviour can be modified by specifying a The expression's behaviour can be modified by specifying a
\var{flags} value. Values can be any of the following variables, \var{flags} value. Values can be any of the following variables,
@ -424,7 +425,7 @@ current locale.
\begin{datadesc}{L} \begin{datadesc}{L}
\dataline{LOCALE} \dataline{LOCALE}
Make \regexp{\e w}, \regexp{\e W}, \regexp{\e b}, and Make \regexp{\e w}, \regexp{\e W}, \regexp{\e b}, and
\regexp{\e B} dependent on the current locale. \regexp{\e B} dependent on the current locale.
\end{datadesc} \end{datadesc}
\begin{datadesc}{M} \begin{datadesc}{M}
@ -456,7 +457,7 @@ Make \regexp{\e w}, \regexp{\e W}, \regexp{\e b}, and
\begin{datadesc}{X} \begin{datadesc}{X}
\dataline{VERBOSE} \dataline{VERBOSE}
This flag allows you to write regular expressions that look nicer. This flag allows you to write regular expressions that look nicer.
Whitespace within the pattern is ignored, Whitespace within the pattern is ignored,
except when in a character class or preceded by an unescaped except when in a character class or preceded by an unescaped
backslash, and, when a line contains a \character{\#} neither in a backslash, and, when a line contains a \character{\#} neither in a
character class or preceded by an unescaped backslash, all characters character class or preceded by an unescaped backslash, all characters
@ -605,7 +606,7 @@ attributes:
corresponding \class{MatchObject} instance. Return \code{None} if no corresponding \class{MatchObject} instance. Return \code{None} if no
position in the string matches the pattern; note that this is position in the string matches the pattern; note that this is
different from finding a zero-length match at some point in the string. different from finding a zero-length match at some point in the string.
The optional \var{pos} and \var{endpos} parameters have the same The optional \var{pos} and \var{endpos} parameters have the same
meaning as for the \method{match()} method. meaning as for the \method{match()} method.
\end{methoddesc} \end{methoddesc}
@ -659,7 +660,7 @@ The flags argument used when the RE object was compiled, or
\end{memberdesc} \end{memberdesc}
\begin{memberdesc}[RegexObject]{groupindex} \begin{memberdesc}[RegexObject]{groupindex}
A dictionary mapping any symbolic group names defined by A dictionary mapping any symbolic group names defined by
\regexp{(?P<\var{id}>)} to group numbers. The dictionary is empty if no \regexp{(?P<\var{id}>)} to group numbers. The dictionary is empty if no
symbolic groups were used in the pattern. symbolic groups were used in the pattern.
\end{memberdesc} \end{memberdesc}
@ -695,13 +696,13 @@ the string matching the the corresponding parenthesized group. If a
group number is negative or larger than the number of groups defined group number is negative or larger than the number of groups defined
in the pattern, an \exception{IndexError} exception is raised. in the pattern, an \exception{IndexError} exception is raised.
If a group is contained in a part of the pattern that did not match, If a group is contained in a part of the pattern that did not match,
the corresponding result is \code{None}. If a group is contained in a the corresponding result is \code{None}. If a group is contained in a
part of the pattern that matched multiple times, the last match is part of the pattern that matched multiple times, the last match is
returned. returned.
If the regular expression uses the \regexp{(?P<\var{name}>...)} syntax, If the regular expression uses the \regexp{(?P<\var{name}>...)} syntax,
the \var{groupN} arguments may also be strings identifying groups by the \var{groupN} arguments may also be strings identifying groups by
their group name. If a string argument is not used as a group name in their group name. If a string argument is not used as a group name in
the pattern, an \exception{IndexError} exception is raised. the pattern, an \exception{IndexError} exception is raised.
A moderately complicated example: A moderately complicated example:
@ -765,7 +766,7 @@ Note that if \var{group} did not contribute to the match, this is
\begin{memberdesc}[MatchObject]{pos} \begin{memberdesc}[MatchObject]{pos}
The value of \var{pos} which was passed to the The value of \var{pos} which was passed to the
\function{search()} or \function{match()} function. This is the index \function{search()} or \function{match()} function. This is the index
into the string at which the RE engine started looking for a match. into the string at which the RE engine started looking for a match.
\end{memberdesc} \end{memberdesc}
\begin{memberdesc}[MatchObject]{endpos} \begin{memberdesc}[MatchObject]{endpos}

View file

@ -81,8 +81,10 @@ class Pattern:
gid = self.groups gid = self.groups
self.groups = gid + 1 self.groups = gid + 1
if name: if name:
if self.groupdict.has_key(name): ogid = self.groupdict.get(name, None)
raise error, "can only use each group name once" if ogid is not None:
raise error, ("redefinition of group name %s as group %d; " +
"was group %d") % (`name`, gid, ogid)
self.groupdict[name] = gid self.groupdict[name] = gid
self.open.append(gid) self.open.append(gid)
return gid return gid

View file

@ -46,6 +46,12 @@ Extension modules
Library Library
- Symbolic group names in regular expressions must be unique. For
example, the regexp r'(?P<abc>)(?P<abc>)' is not allowed, because a
single name can't mean both "group 1" and "group 2" simultaneously.
Python 2.2 detects this error at regexp compilation time; previously,
the error went undetected, and results were unpredictable.
- Tix exposes more commands through the classes DirSelectBox, - Tix exposes more commands through the classes DirSelectBox,
DirSelectDialog, ListNoteBook, Meter, CheckList, and the DirSelectDialog, ListNoteBook, Meter, CheckList, and the
methods tix_addbitmapdir, tix_cget, tix_configure, tix_filedialog, methods tix_addbitmapdir, tix_cget, tix_configure, tix_filedialog,