mirror of
https://github.com/python/cpython.git
synced 2025-07-23 11:15:24 +00:00
Markup nits.
This commit is contained in:
parent
faff0bdcba
commit
d16d4981d1
1 changed files with 36 additions and 36 deletions
|
@ -33,8 +33,7 @@ while \code{"\e n"} is a one-character string containing a newline.
|
|||
Usually patterns will be expressed in Python code using this raw
|
||||
string notation.
|
||||
|
||||
\subsection{Regular Expression Syntax}
|
||||
\label{re-syntax}
|
||||
\subsection{Regular Expression Syntax \label{re-syntax}}
|
||||
|
||||
A regular expression (or RE) specifies a set of strings that matches
|
||||
it; the functions in this module let you check if a particular string
|
||||
|
@ -70,29 +69,31 @@ The special characters are:
|
|||
% define these since they're used twice:
|
||||
\newcommand{\MyLeftMargin}{0.7in}
|
||||
\newcommand{\MyLabelWidth}{0.65in}
|
||||
|
||||
\begin{list}{}{\leftmargin \MyLeftMargin \labelwidth \MyLabelWidth}
|
||||
|
||||
\item[\character{.}] (Dot.) In the default mode, this matches any
|
||||
character except a newline. If the \constant{DOTALL} flag has been
|
||||
specified, this matches any character including a newline.
|
||||
%
|
||||
|
||||
\item[\character{\^}] (Caret.) Matches the start of the string, and in
|
||||
\constant{MULTILINE} mode also matches immediately after each newline.
|
||||
%
|
||||
|
||||
\item[\character{\$}] Matches the end of the string, and in
|
||||
\constant{MULTILINE} mode also matches before a newline.
|
||||
\regexp{foo} matches both 'foo' and 'foobar', while the regular
|
||||
expression \regexp{foo\$} matches only 'foo'.
|
||||
%
|
||||
|
||||
\item[\character{*}] Causes the resulting RE to
|
||||
match 0 or more repetitions of the preceding RE, as many repetitions
|
||||
as are possible. \regexp{ab*} will
|
||||
match 'a', 'ab', or 'a' followed by any number of 'b's.
|
||||
%
|
||||
|
||||
\item[\character{+}] Causes the
|
||||
resulting RE to match 1 or more repetitions of the preceding RE.
|
||||
\regexp{ab+} will match 'a' followed by any non-zero number of 'b's; it
|
||||
will not match just 'a'.
|
||||
%
|
||||
|
||||
\item[\character{?}] Causes the resulting RE to
|
||||
match 0 or 1 repetitions of the preceding RE. \regexp{ab?} will
|
||||
match either 'a' or 'ab'.
|
||||
|
@ -105,24 +106,26 @@ Adding \character{?} after the qualifier makes it perform the match in
|
|||
\dfn{non-greedy} or \dfn{minimal} fashion; as \emph{few} characters as
|
||||
possible will be matched. Using \regexp{.*?} in the previous
|
||||
expression will match only \code{'<H1>'}.
|
||||
%
|
||||
|
||||
\item[\code{\{\var{m},\var{n}\}}] Causes the resulting RE to match from
|
||||
\var{m} to \var{n} repetitions of the preceding RE, attempting to
|
||||
match as many repetitions as possible. For example, \regexp{a\{3,5\}}
|
||||
will match from 3 to 5 \character{a} characters. Omitting \var{m} is the same
|
||||
as specifying 0 for the lower bound; omitting \var{n} specifies an
|
||||
infinite upper bound.
|
||||
%
|
||||
|
||||
\item[\code{\{\var{m},\var{n}\}?}] Causes the resulting RE to
|
||||
match from \var{m} to \var{n} repetitions of the preceding RE,
|
||||
attempting to match as \emph{few} repetitions as possible. This is
|
||||
the non-greedy version of the previous qualifier. For example, on the
|
||||
6-character string \code{'aaaaaa'}, \regexp{a\{3,5\}} will match 5 \character{a}
|
||||
characters, while \regexp{a\{3,5\}?} will only match 3 characters.
|
||||
%
|
||||
\item[\character{\e}] Either escapes special characters (permitting you to match
|
||||
characters like \character{*}, \character{?}, and so forth), or
|
||||
signals a special sequence; special sequences are discussed below.
|
||||
6-character string \code{'aaaaaa'}, \regexp{a\{3,5\}} will match 5
|
||||
\character{a} characters, while \regexp{a\{3,5\}?} will only match 3
|
||||
characters.
|
||||
|
||||
\item[\character{\e}] Either escapes special characters (permitting
|
||||
you to match characters like \character{*}, \character{?}, and so
|
||||
forth), or signals a special sequence; special sequences are discussed
|
||||
below.
|
||||
|
||||
If you're not using a raw string to
|
||||
express the pattern, remember that Python also uses the
|
||||
|
@ -133,7 +136,7 @@ if Python would recognize the resulting sequence, the backslash should
|
|||
be repeated twice. This is complicated and hard to understand, so
|
||||
it's highly recommended that you use raw strings for all but the
|
||||
simplest expressions.
|
||||
%
|
||||
|
||||
\item[\code{[]}] Used to indicate a set of characters. Characters can
|
||||
be listed individually, or a range of characters can be indicated by
|
||||
giving two characters and separating them by a \character{-}. Special
|
||||
|
@ -153,28 +156,27 @@ the set. This is indicated by including a
|
|||
simply match the \character{\^} character. For example, \regexp{[\^5]}
|
||||
will match any character except \character{5}.
|
||||
|
||||
%
|
||||
\item[\character{|}]\code{A|B}, where A and B can be arbitrary REs,
|
||||
creates a regular expression that will match either A or B. This can
|
||||
be used inside groups (see below) as well. To match a literal \character{|},
|
||||
use \regexp{\e|}, or enclose it inside a character class, as in \regexp{[|]}.
|
||||
%
|
||||
|
||||
\item[\code{(...)}] Matches whatever regular expression is inside the
|
||||
parentheses, and indicates the start and end of a group; the contents
|
||||
of a group can be retrieved after a match has been performed, and can
|
||||
be matched later in the string with the \regexp{\e \var{number}} special
|
||||
sequence, described below. To match the literals \character{(} or \character{')},
|
||||
use \regexp{\e(} or \regexp{\e)}, or enclose them inside a character
|
||||
class: \regexp{[(] [)]}.
|
||||
%
|
||||
\item[\code{(?...)}] This is an extension notation (a \character{?} following a
|
||||
\character{(} is not meaningful otherwise). The first character after
|
||||
the \character{?}
|
||||
sequence, described below. To match the literals \character{(} or
|
||||
\character{')}, use \regexp{\e(} or \regexp{\e)}, or enclose them
|
||||
inside a character class: \regexp{[(] [)]}.
|
||||
|
||||
\item[\code{(?...)}] This is an extension notation (a \character{?}
|
||||
following a \character{(} is not meaningful otherwise). The first
|
||||
character after the \character{?}
|
||||
determines what the meaning and further syntax of the construct is.
|
||||
Extensions usually do not create a new group;
|
||||
\regexp{(?P<\var{name}>...)} is the only exception to this rule.
|
||||
Following are the currently supported extensions.
|
||||
%
|
||||
|
||||
\item[\code{(?iLmsx)}] (One or more letters from the set \character{i},
|
||||
\character{L}, \character{m}, \character{s}, \character{x}.) The group matches
|
||||
the empty string; the letters set the corresponding flags
|
||||
|
@ -182,13 +184,13 @@ the empty string; the letters set the corresponding flags
|
|||
\constant{re.X}) for the entire regular expression. This is useful if
|
||||
you wish to include the flags as part of the regular expression, instead
|
||||
of passing a \var{flag} argument to the \function{compile()} function.
|
||||
%
|
||||
|
||||
\item[\code{(?:...)}] A non-grouping version of regular parentheses.
|
||||
Matches whatever regular expression is inside the parentheses, but the
|
||||
substring matched by the
|
||||
group \emph{cannot} be retrieved after performing a match or
|
||||
referenced later in the pattern.
|
||||
%
|
||||
|
||||
\item[\code{(?P<\var{name}>...)}] Similar to regular parentheses, but
|
||||
the substring matched by the group is accessible via the symbolic group
|
||||
name \var{name}. Group names must be valid Python identifiers. A
|
||||
|
@ -201,18 +203,18 @@ For example, if the pattern is
|
|||
name in arguments to methods of match objects, such as \code{m.group('id')}
|
||||
or \code{m.end('id')}, and also by name in pattern text
|
||||
(e.g. \regexp{(?P=id)}) and replacement text (e.g. \code{\e g<id>}).
|
||||
%
|
||||
|
||||
\item[\code{(?P=\var{name})}] Matches whatever text was matched by the
|
||||
earlier group named \var{name}.
|
||||
%
|
||||
|
||||
\item[\code{(?\#...)}] A comment; the contents of the parentheses are
|
||||
simply ignored.
|
||||
%
|
||||
|
||||
\item[\code{(?=...)}] Matches if \regexp{...} matches next, but doesn't
|
||||
consume any of the string. This is called a lookahead assertion. For
|
||||
example, \regexp{Isaac (?=Asimov)} will match \code{'Isaac~'} only if it's
|
||||
followed by \code{'Asimov'}.
|
||||
%
|
||||
|
||||
\item[\code{(?!...)}] Matches if \regexp{...} doesn't match next. This
|
||||
is a negative lookahead assertion. For example,
|
||||
\regexp{Isaac (?!Asimov)} will match \code{'Isaac~'} only if it's \emph{not}
|
||||
|
@ -474,8 +476,7 @@ Perform the same operation as \function{sub()}, but return a tuple
|
|||
\end{excdesc}
|
||||
|
||||
|
||||
\subsection{Regular Expression Objects}
|
||||
\label{re-objects}
|
||||
\subsection{Regular Expression Objects \label{re-objects}}
|
||||
|
||||
Compiled regular expression objects support the following methods and
|
||||
attributes:
|
||||
|
@ -547,8 +548,7 @@ The pattern string from which the regex object was compiled.
|
|||
\end{memberdesc}
|
||||
|
||||
|
||||
\subsection{Match Objects}
|
||||
\label{match-objects}
|
||||
\subsection{Match Objects \label{match-objects}}
|
||||
|
||||
\class{MatchObject} instances support the following methods and attributes:
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue