mirror of
https://github.com/python/cpython.git
synced 2025-09-28 11:15:17 +00:00
[3.12] GH-107678: Improve Unicode handling clarity in `library/re.rst
` (GH-107679) (#113965)
GH-107678: Improve Unicode handling clarity in ``library/re.rst`` (GH-107679)
(cherry picked from commit c9b8a22f34
)
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
This commit is contained in:
parent
b902671d36
commit
bd9ea91e5f
1 changed files with 144 additions and 91 deletions
|
@ -17,7 +17,7 @@ those found in Perl.
|
||||||
Both patterns and strings to be searched can be Unicode strings (:class:`str`)
|
Both patterns and strings to be searched can be Unicode strings (:class:`str`)
|
||||||
as well as 8-bit strings (:class:`bytes`).
|
as well as 8-bit strings (:class:`bytes`).
|
||||||
However, Unicode strings and 8-bit strings cannot be mixed:
|
However, Unicode strings and 8-bit strings cannot be mixed:
|
||||||
that is, you cannot match a Unicode string with a byte pattern or
|
that is, you cannot match a Unicode string with a bytes pattern or
|
||||||
vice-versa; similarly, when asking for a substitution, the replacement
|
vice-versa; similarly, when asking for a substitution, the replacement
|
||||||
string must be of the same type as both the pattern and the search string.
|
string must be of the same type as both the pattern and the search string.
|
||||||
|
|
||||||
|
@ -257,8 +257,7 @@ The special characters are:
|
||||||
.. index:: single: \ (backslash); in regular expressions
|
.. index:: single: \ (backslash); in regular expressions
|
||||||
|
|
||||||
* Character classes such as ``\w`` or ``\S`` (defined below) are also accepted
|
* Character classes such as ``\w`` or ``\S`` (defined below) are also accepted
|
||||||
inside a set, although the characters they match depends on whether
|
inside a set, although the characters they match depend on the flags_ used.
|
||||||
:const:`ASCII` or :const:`LOCALE` mode is in force.
|
|
||||||
|
|
||||||
.. index:: single: ^ (caret); in regular expressions
|
.. index:: single: ^ (caret); in regular expressions
|
||||||
|
|
||||||
|
@ -326,18 +325,24 @@ The special characters are:
|
||||||
currently supported extensions.
|
currently supported extensions.
|
||||||
|
|
||||||
``(?aiLmsux)``
|
``(?aiLmsux)``
|
||||||
(One or more letters from the set ``'a'``, ``'i'``, ``'L'``, ``'m'``,
|
(One or more letters from the set
|
||||||
``'s'``, ``'u'``, ``'x'``.) The group matches the empty string; the
|
``'a'``, ``'i'``, ``'L'``, ``'m'``, ``'s'``, ``'u'``, ``'x'``.)
|
||||||
letters set the corresponding flags: :const:`re.A` (ASCII-only matching),
|
The group matches the empty string;
|
||||||
:const:`re.I` (ignore case), :const:`re.L` (locale dependent),
|
the letters set the corresponding flags for the entire regular expression:
|
||||||
:const:`re.M` (multi-line), :const:`re.S` (dot matches all),
|
|
||||||
:const:`re.U` (Unicode matching), and :const:`re.X` (verbose),
|
* :const:`re.A` (ASCII-only matching)
|
||||||
for the entire regular expression.
|
* :const:`re.I` (ignore case)
|
||||||
|
* :const:`re.L` (locale dependent)
|
||||||
|
* :const:`re.M` (multi-line)
|
||||||
|
* :const:`re.S` (dot matches all)
|
||||||
|
* :const:`re.U` (Unicode matching)
|
||||||
|
* :const:`re.X` (verbose)
|
||||||
|
|
||||||
(The flags are described in :ref:`contents-of-module-re`.)
|
(The flags are described in :ref:`contents-of-module-re`.)
|
||||||
This is useful if you wish to include the flags as part of the
|
This is useful if you wish to include the flags as part of the
|
||||||
regular expression, instead of passing a *flag* argument to the
|
regular expression, instead of passing a *flag* argument to the
|
||||||
:func:`re.compile` function. Flags should be used first in the
|
:func:`re.compile` function.
|
||||||
expression string.
|
Flags should be used first in the expression string.
|
||||||
|
|
||||||
.. versionchanged:: 3.11
|
.. versionchanged:: 3.11
|
||||||
This construction can only be used at the start of the expression.
|
This construction can only be used at the start of the expression.
|
||||||
|
@ -351,14 +356,20 @@ The special characters are:
|
||||||
pattern.
|
pattern.
|
||||||
|
|
||||||
``(?aiLmsux-imsx:...)``
|
``(?aiLmsux-imsx:...)``
|
||||||
(Zero or more letters from the set ``'a'``, ``'i'``, ``'L'``, ``'m'``,
|
(Zero or more letters from the set
|
||||||
``'s'``, ``'u'``, ``'x'``, optionally followed by ``'-'`` followed by
|
``'a'``, ``'i'``, ``'L'``, ``'m'``, ``'s'``, ``'u'``, ``'x'``,
|
||||||
|
optionally followed by ``'-'`` followed by
|
||||||
one or more letters from the ``'i'``, ``'m'``, ``'s'``, ``'x'``.)
|
one or more letters from the ``'i'``, ``'m'``, ``'s'``, ``'x'``.)
|
||||||
The letters set or remove the corresponding flags:
|
The letters set or remove the corresponding flags for the part of the expression:
|
||||||
:const:`re.A` (ASCII-only matching), :const:`re.I` (ignore case),
|
|
||||||
:const:`re.L` (locale dependent), :const:`re.M` (multi-line),
|
* :const:`re.A` (ASCII-only matching)
|
||||||
:const:`re.S` (dot matches all), :const:`re.U` (Unicode matching),
|
* :const:`re.I` (ignore case)
|
||||||
and :const:`re.X` (verbose), for the part of the expression.
|
* :const:`re.L` (locale dependent)
|
||||||
|
* :const:`re.M` (multi-line)
|
||||||
|
* :const:`re.S` (dot matches all)
|
||||||
|
* :const:`re.U` (Unicode matching)
|
||||||
|
* :const:`re.X` (verbose)
|
||||||
|
|
||||||
(The flags are described in :ref:`contents-of-module-re`.)
|
(The flags are described in :ref:`contents-of-module-re`.)
|
||||||
|
|
||||||
The letters ``'a'``, ``'L'`` and ``'u'`` are mutually exclusive when used
|
The letters ``'a'``, ``'L'`` and ``'u'`` are mutually exclusive when used
|
||||||
|
@ -366,7 +377,7 @@ The special characters are:
|
||||||
when one of them appears in an inline group, it overrides the matching mode
|
when one of them appears in an inline group, it overrides the matching mode
|
||||||
in the enclosing group. In Unicode patterns ``(?a:...)`` switches to
|
in the enclosing group. In Unicode patterns ``(?a:...)`` switches to
|
||||||
ASCII-only matching, and ``(?u:...)`` switches to Unicode matching
|
ASCII-only matching, and ``(?u:...)`` switches to Unicode matching
|
||||||
(default). In byte pattern ``(?L:...)`` switches to locale depending
|
(default). In bytes patterns ``(?L:...)`` switches to locale dependent
|
||||||
matching, and ``(?a:...)`` switches to ASCII-only matching (default).
|
matching, and ``(?a:...)`` switches to ASCII-only matching (default).
|
||||||
This override is only in effect for the narrow inline group, and the
|
This override is only in effect for the narrow inline group, and the
|
||||||
original matching mode is restored outside of the group.
|
original matching mode is restored outside of the group.
|
||||||
|
@ -529,47 +540,61 @@ character ``'$'``.
|
||||||
|
|
||||||
``\b``
|
``\b``
|
||||||
Matches the empty string, but only at the beginning or end of a word.
|
Matches the empty string, but only at the beginning or end of a word.
|
||||||
A word is defined as a sequence of word characters. Note that formally,
|
A word is defined as a sequence of word characters.
|
||||||
``\b`` is defined as the boundary between a ``\w`` and a ``\W`` character
|
Note that formally, ``\b`` is defined as the boundary
|
||||||
(or vice versa), or between ``\w`` and the beginning/end of the string.
|
between a ``\w`` and a ``\W`` character (or vice versa),
|
||||||
This means that ``r'\bfoo\b'`` matches ``'foo'``, ``'foo.'``, ``'(foo)'``,
|
or between ``\w`` and the beginning or end of the string.
|
||||||
``'bar foo baz'`` but not ``'foobar'`` or ``'foo3'``.
|
This means that ``r'\bat\b'`` matches ``'at'``, ``'at.'``, ``'(at)'``,
|
||||||
|
and ``'as at ay'`` but not ``'attempt'`` or ``'atlas'``.
|
||||||
|
|
||||||
By default Unicode alphanumerics are the ones used in Unicode patterns, but
|
The default word characters in Unicode (str) patterns
|
||||||
this can be changed by using the :const:`ASCII` flag. Word boundaries are
|
are Unicode alphanumerics and the underscore,
|
||||||
determined by the current locale if the :const:`LOCALE` flag is used.
|
but this can be changed by using the :py:const:`~re.ASCII` flag.
|
||||||
Inside a character range, ``\b`` represents the backspace character, for
|
Word boundaries are determined by the current locale
|
||||||
compatibility with Python's string literals.
|
if the :py:const:`~re.LOCALE` flag is used.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Inside a character range, ``\b`` represents the backspace character,
|
||||||
|
for compatibility with Python's string literals.
|
||||||
|
|
||||||
.. index:: single: \B; in regular expressions
|
.. index:: single: \B; in regular expressions
|
||||||
|
|
||||||
``\B``
|
``\B``
|
||||||
Matches the empty string, but only when it is *not* at the beginning or end
|
Matches the empty string,
|
||||||
of a word. This means that ``r'py\B'`` matches ``'python'``, ``'py3'``,
|
but only when it is *not* at the beginning or end of a word.
|
||||||
``'py2'``, but not ``'py'``, ``'py.'``, or ``'py!'``.
|
This means that ``r'at\B'`` matches ``'athens'``, ``'atom'``,
|
||||||
``\B`` is just the opposite of ``\b``, so word characters in Unicode
|
``'attorney'``, but not ``'at'``, ``'at.'``, or ``'at!'``.
|
||||||
patterns are Unicode alphanumerics or the underscore, although this can
|
``\B`` is the opposite of ``\b``,
|
||||||
be changed by using the :const:`ASCII` flag. Word boundaries are
|
so word characters in Unicode (str) patterns
|
||||||
determined by the current locale if the :const:`LOCALE` flag is used.
|
are Unicode alphanumerics or the underscore,
|
||||||
|
although this can be changed by using the :py:const:`~re.ASCII` flag.
|
||||||
|
Word boundaries are determined by the current locale
|
||||||
|
if the :py:const:`~re.LOCALE` flag is used.
|
||||||
|
|
||||||
.. index:: single: \d; in regular expressions
|
.. index:: single: \d; in regular expressions
|
||||||
|
|
||||||
``\d``
|
``\d``
|
||||||
For Unicode (str) patterns:
|
For Unicode (str) patterns:
|
||||||
Matches any Unicode decimal digit (that is, any character in
|
Matches any Unicode decimal digit
|
||||||
Unicode character category [Nd]). This includes ``[0-9]``, and
|
(that is, any character in Unicode character category `[Nd]`__).
|
||||||
also many other digit characters. If the :const:`ASCII` flag is
|
This includes ``[0-9]``, and also many other digit characters.
|
||||||
used only ``[0-9]`` is matched.
|
|
||||||
|
Matches ``[0-9]`` if the :py:const:`~re.ASCII` flag is used.
|
||||||
|
|
||||||
|
__ https://www.unicode.org/versions/Unicode15.0.0/ch04.pdf#G134153
|
||||||
|
|
||||||
For 8-bit (bytes) patterns:
|
For 8-bit (bytes) patterns:
|
||||||
Matches any decimal digit; this is equivalent to ``[0-9]``.
|
Matches any decimal digit in the ASCII character set;
|
||||||
|
this is equivalent to ``[0-9]``.
|
||||||
|
|
||||||
.. index:: single: \D; in regular expressions
|
.. index:: single: \D; in regular expressions
|
||||||
|
|
||||||
``\D``
|
``\D``
|
||||||
Matches any character which is not a decimal digit. This is
|
Matches any character which is not a decimal digit.
|
||||||
the opposite of ``\d``. If the :const:`ASCII` flag is used this
|
This is the opposite of ``\d``.
|
||||||
becomes the equivalent of ``[^0-9]``.
|
|
||||||
|
Matches ``[^0-9]`` if the :py:const:`~re.ASCII` flag is used.
|
||||||
|
|
||||||
.. index:: single: \s; in regular expressions
|
.. index:: single: \s; in regular expressions
|
||||||
|
|
||||||
|
@ -578,8 +603,9 @@ character ``'$'``.
|
||||||
Matches Unicode whitespace characters (which includes
|
Matches Unicode whitespace characters (which includes
|
||||||
``[ \t\n\r\f\v]``, and also many other characters, for example the
|
``[ \t\n\r\f\v]``, and also many other characters, for example the
|
||||||
non-breaking spaces mandated by typography rules in many
|
non-breaking spaces mandated by typography rules in many
|
||||||
languages). If the :const:`ASCII` flag is used, only
|
languages).
|
||||||
``[ \t\n\r\f\v]`` is matched.
|
|
||||||
|
Matches ``[ \t\n\r\f\v]`` if the :py:const:`~re.ASCII` flag is used.
|
||||||
|
|
||||||
For 8-bit (bytes) patterns:
|
For 8-bit (bytes) patterns:
|
||||||
Matches characters considered whitespace in the ASCII character set;
|
Matches characters considered whitespace in the ASCII character set;
|
||||||
|
@ -589,30 +615,39 @@ character ``'$'``.
|
||||||
|
|
||||||
``\S``
|
``\S``
|
||||||
Matches any character which is not a whitespace character. This is
|
Matches any character which is not a whitespace character. This is
|
||||||
the opposite of ``\s``. If the :const:`ASCII` flag is used this
|
the opposite of ``\s``.
|
||||||
becomes the equivalent of ``[^ \t\n\r\f\v]``.
|
|
||||||
|
Matches ``[^ \t\n\r\f\v]`` if the :py:const:`~re.ASCII` flag is used.
|
||||||
|
|
||||||
.. index:: single: \w; in regular expressions
|
.. index:: single: \w; in regular expressions
|
||||||
|
|
||||||
``\w``
|
``\w``
|
||||||
For Unicode (str) patterns:
|
For Unicode (str) patterns:
|
||||||
Matches Unicode word characters; this includes alphanumeric characters (as defined by :meth:`str.isalnum`)
|
Matches Unicode word characters;
|
||||||
|
this includes all Unicode alphanumeric characters
|
||||||
|
(as defined by :py:meth:`str.isalnum`),
|
||||||
as well as the underscore (``_``).
|
as well as the underscore (``_``).
|
||||||
If the :const:`ASCII` flag is used, only ``[a-zA-Z0-9_]`` is matched.
|
|
||||||
|
Matches ``[a-zA-Z0-9_]`` if the :py:const:`~re.ASCII` flag is used.
|
||||||
|
|
||||||
For 8-bit (bytes) patterns:
|
For 8-bit (bytes) patterns:
|
||||||
Matches characters considered alphanumeric in the ASCII character set;
|
Matches characters considered alphanumeric in the ASCII character set;
|
||||||
this is equivalent to ``[a-zA-Z0-9_]``. If the :const:`LOCALE` flag is
|
this is equivalent to ``[a-zA-Z0-9_]``.
|
||||||
used, matches characters considered alphanumeric in the current locale
|
If the :py:const:`~re.LOCALE` flag is used,
|
||||||
and the underscore.
|
matches characters considered alphanumeric in the current locale and the underscore.
|
||||||
|
|
||||||
.. index:: single: \W; in regular expressions
|
.. index:: single: \W; in regular expressions
|
||||||
|
|
||||||
``\W``
|
``\W``
|
||||||
Matches any character which is not a word character. This is
|
Matches any character which is not a word character.
|
||||||
the opposite of ``\w``. If the :const:`ASCII` flag is used this
|
This is the opposite of ``\w``.
|
||||||
becomes the equivalent of ``[^a-zA-Z0-9_]``. If the :const:`LOCALE` flag is
|
By default, matches non-underscore (``_``) characters
|
||||||
used, matches characters which are neither alphanumeric in the current locale
|
for which :py:meth:`str.isalnum` returns ``False``.
|
||||||
|
|
||||||
|
Matches ``[^a-zA-Z0-9_]`` if the :py:const:`~re.ASCII` flag is used.
|
||||||
|
|
||||||
|
If the :py:const:`~re.LOCALE` flag is used,
|
||||||
|
matches characters which are neither alphanumeric in the current locale
|
||||||
nor the underscore.
|
nor the underscore.
|
||||||
|
|
||||||
.. index:: single: \Z; in regular expressions
|
.. index:: single: \Z; in regular expressions
|
||||||
|
@ -644,9 +679,11 @@ string literals are also accepted by the regular expression parser::
|
||||||
(Note that ``\b`` is used to represent word boundaries, and means "backspace"
|
(Note that ``\b`` is used to represent word boundaries, and means "backspace"
|
||||||
only inside character classes.)
|
only inside character classes.)
|
||||||
|
|
||||||
``'\u'``, ``'\U'``, and ``'\N'`` escape sequences are only recognized in Unicode
|
``'\u'``, ``'\U'``, and ``'\N'`` escape sequences are
|
||||||
patterns. In bytes patterns they are errors. Unknown escapes of ASCII
|
only recognized in Unicode (str) patterns.
|
||||||
letters are reserved for future use and treated as errors.
|
In bytes patterns they are errors.
|
||||||
|
Unknown escapes of ASCII letters are reserved
|
||||||
|
for future use and treated as errors.
|
||||||
|
|
||||||
Octal escapes are included in a limited form. If the first digit is a 0, or if
|
Octal escapes are included in a limited form. If the first digit is a 0, or if
|
||||||
there are three octal digits, it is considered an octal escape. Otherwise, it is
|
there are three octal digits, it is considered an octal escape. Otherwise, it is
|
||||||
|
@ -694,30 +731,37 @@ Flags
|
||||||
|
|
||||||
Make ``\w``, ``\W``, ``\b``, ``\B``, ``\d``, ``\D``, ``\s`` and ``\S``
|
Make ``\w``, ``\W``, ``\b``, ``\B``, ``\d``, ``\D``, ``\s`` and ``\S``
|
||||||
perform ASCII-only matching instead of full Unicode matching. This is only
|
perform ASCII-only matching instead of full Unicode matching. This is only
|
||||||
meaningful for Unicode patterns, and is ignored for byte patterns.
|
meaningful for Unicode (str) patterns, and is ignored for bytes patterns.
|
||||||
|
|
||||||
Corresponds to the inline flag ``(?a)``.
|
Corresponds to the inline flag ``(?a)``.
|
||||||
|
|
||||||
Note that for backward compatibility, the :const:`re.U` flag still
|
.. note::
|
||||||
exists (as well as its synonym :const:`re.UNICODE` and its embedded
|
|
||||||
counterpart ``(?u)``), but these are redundant in Python 3 since
|
The :py:const:`~re.U` flag still exists for backward compatibility,
|
||||||
matches are Unicode by default for strings (and Unicode matching
|
but is redundant in Python 3 since
|
||||||
isn't allowed for bytes).
|
matches are Unicode by default for ``str`` patterns,
|
||||||
|
and Unicode matching isn't allowed for bytes patterns.
|
||||||
|
:py:const:`~re.UNICODE` and the inline flag ``(?u)`` are similarly redundant.
|
||||||
|
|
||||||
|
|
||||||
.. data:: DEBUG
|
.. data:: DEBUG
|
||||||
|
|
||||||
Display debug information about compiled expression.
|
Display debug information about compiled expression.
|
||||||
|
|
||||||
No corresponding inline flag.
|
No corresponding inline flag.
|
||||||
|
|
||||||
|
|
||||||
.. data:: I
|
.. data:: I
|
||||||
IGNORECASE
|
IGNORECASE
|
||||||
|
|
||||||
Perform case-insensitive matching; expressions like ``[A-Z]`` will also
|
Perform case-insensitive matching;
|
||||||
match lowercase letters. Full Unicode matching (such as ``Ü`` matching
|
expressions like ``[A-Z]`` will also match lowercase letters.
|
||||||
``ü``) also works unless the :const:`re.ASCII` flag is used to disable
|
Full Unicode matching (such as ``Ü`` matching ``ü``)
|
||||||
non-ASCII matches. The current locale does not change the effect of this
|
also works unless the :py:const:`~re.ASCII` flag
|
||||||
flag unless the :const:`re.LOCALE` flag is also used.
|
is used to disable non-ASCII matches.
|
||||||
|
The current locale does not change the effect of this flag
|
||||||
|
unless the :py:const:`~re.LOCALE` flag is also used.
|
||||||
|
|
||||||
Corresponds to the inline flag ``(?i)``.
|
Corresponds to the inline flag ``(?i)``.
|
||||||
|
|
||||||
Note that when the Unicode patterns ``[a-z]`` or ``[A-Z]`` are used in
|
Note that when the Unicode patterns ``[a-z]`` or ``[A-Z]`` are used in
|
||||||
|
@ -725,29 +769,35 @@ Flags
|
||||||
letters and 4 additional non-ASCII letters: 'İ' (U+0130, Latin capital
|
letters and 4 additional non-ASCII letters: 'İ' (U+0130, Latin capital
|
||||||
letter I with dot above), 'ı' (U+0131, Latin small letter dotless i),
|
letter I with dot above), 'ı' (U+0131, Latin small letter dotless i),
|
||||||
'ſ' (U+017F, Latin small letter long s) and 'K' (U+212A, Kelvin sign).
|
'ſ' (U+017F, Latin small letter long s) and 'K' (U+212A, Kelvin sign).
|
||||||
If the :const:`ASCII` flag is used, only letters 'a' to 'z'
|
If the :py:const:`~re.ASCII` flag is used, only letters 'a' to 'z'
|
||||||
and 'A' to 'Z' are matched.
|
and 'A' to 'Z' are matched.
|
||||||
|
|
||||||
.. data:: L
|
.. data:: L
|
||||||
LOCALE
|
LOCALE
|
||||||
|
|
||||||
Make ``\w``, ``\W``, ``\b``, ``\B`` and case-insensitive matching
|
Make ``\w``, ``\W``, ``\b``, ``\B`` and case-insensitive matching
|
||||||
dependent on the current locale. This flag can be used only with bytes
|
dependent on the current locale.
|
||||||
patterns. The use of this flag is discouraged as the locale mechanism
|
This flag can be used only with bytes patterns.
|
||||||
is very unreliable, it only handles one "culture" at a time, and it only
|
|
||||||
works with 8-bit locales. Unicode matching is already enabled by default
|
|
||||||
in Python 3 for Unicode (str) patterns, and it is able to handle different
|
|
||||||
locales/languages.
|
|
||||||
Corresponds to the inline flag ``(?L)``.
|
Corresponds to the inline flag ``(?L)``.
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
This flag is discouraged; consider Unicode matching instead.
|
||||||
|
The locale mechanism is very unreliable
|
||||||
|
as it only handles one "culture" at a time
|
||||||
|
and only works with 8-bit locales.
|
||||||
|
Unicode matching is enabled by default for Unicode (str) patterns
|
||||||
|
and it is able to handle different locales and languages.
|
||||||
|
|
||||||
.. versionchanged:: 3.6
|
.. versionchanged:: 3.6
|
||||||
:const:`re.LOCALE` can be used only with bytes patterns and is
|
:py:const:`~re.LOCALE` can be used only with bytes patterns
|
||||||
not compatible with :const:`re.ASCII`.
|
and is not compatible with :py:const:`~re.ASCII`.
|
||||||
|
|
||||||
.. versionchanged:: 3.7
|
.. versionchanged:: 3.7
|
||||||
Compiled regular expression objects with the :const:`re.LOCALE` flag no
|
Compiled regular expression objects with the :py:const:`~re.LOCALE` flag
|
||||||
longer depend on the locale at compile time. Only the locale at
|
no longer depend on the locale at compile time.
|
||||||
matching time affects the result of matching.
|
Only the locale at matching time affects the result of matching.
|
||||||
|
|
||||||
|
|
||||||
.. data:: M
|
.. data:: M
|
||||||
|
@ -759,6 +809,7 @@ Flags
|
||||||
end of each line (immediately preceding each newline). By default, ``'^'``
|
end of each line (immediately preceding each newline). By default, ``'^'``
|
||||||
matches only at the beginning of the string, and ``'$'`` only at the end of the
|
matches only at the beginning of the string, and ``'$'`` only at the end of the
|
||||||
string and immediately before the newline (if any) at the end of the string.
|
string and immediately before the newline (if any) at the end of the string.
|
||||||
|
|
||||||
Corresponds to the inline flag ``(?m)``.
|
Corresponds to the inline flag ``(?m)``.
|
||||||
|
|
||||||
.. data:: NOFLAG
|
.. data:: NOFLAG
|
||||||
|
@ -778,19 +829,19 @@ Flags
|
||||||
|
|
||||||
Make the ``'.'`` special character match any character at all, including a
|
Make the ``'.'`` special character match any character at all, including a
|
||||||
newline; without this flag, ``'.'`` will match anything *except* a newline.
|
newline; without this flag, ``'.'`` will match anything *except* a newline.
|
||||||
|
|
||||||
Corresponds to the inline flag ``(?s)``.
|
Corresponds to the inline flag ``(?s)``.
|
||||||
|
|
||||||
|
|
||||||
.. data:: U
|
.. data:: U
|
||||||
UNICODE
|
UNICODE
|
||||||
|
|
||||||
In Python 2, this flag made :ref:`special sequences <re-special-sequences>`
|
In Python 3, Unicode characters are matched by default
|
||||||
include Unicode characters in matches. Since Python 3, Unicode characters
|
for ``str`` patterns.
|
||||||
are matched by default.
|
This flag is therefore redundant with **no effect**
|
||||||
|
and is only kept for backward compatibility.
|
||||||
|
|
||||||
See :const:`A` for restricting matching on ASCII characters instead.
|
See :py:const:`~re.ASCII` to restrict matching to ASCII characters instead.
|
||||||
|
|
||||||
This flag is only kept for backward compatibility.
|
|
||||||
|
|
||||||
.. data:: X
|
.. data:: X
|
||||||
VERBOSE
|
VERBOSE
|
||||||
|
@ -914,6 +965,8 @@ Functions
|
||||||
Empty matches for the pattern split the string only when not adjacent
|
Empty matches for the pattern split the string only when not adjacent
|
||||||
to a previous empty match.
|
to a previous empty match.
|
||||||
|
|
||||||
|
.. code:: pycon
|
||||||
|
|
||||||
>>> re.split(r'\b', 'Words, words, words.')
|
>>> re.split(r'\b', 'Words, words, words.')
|
||||||
['', 'Words', ', ', 'words', ', ', 'words', '.']
|
['', 'Words', ', ', 'words', ', ', 'words', '.']
|
||||||
>>> re.split(r'\W*', '...words...')
|
>>> re.split(r'\W*', '...words...')
|
||||||
|
@ -1231,7 +1284,7 @@ Regular Expression Objects
|
||||||
|
|
||||||
The regex matching flags. This is a combination of the flags given to
|
The regex matching flags. This is a combination of the flags given to
|
||||||
:func:`.compile`, any ``(?...)`` inline flags in the pattern, and implicit
|
:func:`.compile`, any ``(?...)`` inline flags in the pattern, and implicit
|
||||||
flags such as :data:`UNICODE` if the pattern is a Unicode string.
|
flags such as :py:const:`~re.UNICODE` if the pattern is a Unicode string.
|
||||||
|
|
||||||
|
|
||||||
.. attribute:: Pattern.groups
|
.. attribute:: Pattern.groups
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue