gh-91760: More strict rules for numerical group references and group names in RE (GH-91792)

Only sequence of ASCII digits is now accepted as a numerical reference.
The group name in bytes patterns and replacement strings can now only
contain ASCII letters and digits and underscore.
This commit is contained in:
Serhiy Storchaka 2022-05-08 19:19:29 +03:00 committed by GitHub
parent 7b024e3a3f
commit a84a56d80f
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
5 changed files with 62 additions and 91 deletions

View file

@ -395,7 +395,8 @@ The special characters are:
``(?P<name>...)``
Similar to regular parentheses, but the substring matched by the group is
accessible via the symbolic group name *name*. Group names must be valid
Python identifiers, and each group name must be defined only once within a
Python identifiers, and in bytes patterns they must contain only characters
in the ASCII range. Each group name must be defined only once within a
regular expression. A symbolic group is also a numbered group, just as if
the group were not named.
@ -417,8 +418,9 @@ The special characters are:
| | * ``\1`` |
+---------------------------------------+----------------------------------+
.. deprecated:: 3.11
Group names containing non-ASCII characters in bytes patterns.
.. versionchanged:: 3.12
In bytes patterns group names must contain only characters in
the ASCII range.
.. index:: single: (?P=; in regular expressions
@ -489,8 +491,8 @@ The special characters are:
will match with ``'<user@host.com>'`` as well as ``'user@host.com'``, but
not with ``'<user@host.com'`` nor ``'user@host.com>'``.
.. deprecated:: 3.11
Group *id* containing anything except ASCII digits.
.. versionchanged:: 3.12
Group *id* can only contain ASCII digits.
The special sequences consist of ``'\'`` and a character from the list below.
@ -1001,9 +1003,10 @@ form.
Empty matches for the pattern are replaced when adjacent to a previous
non-empty match.
.. deprecated:: 3.11
Group *id* containing anything except ASCII digits.
Group names containing non-ASCII characters in bytes replacement strings.
.. versionchanged:: 3.12
Group *id* can only contain ASCII digits.
In bytes replacement strings group names must contain only characters
in the ASCII range.
.. function:: subn(pattern, repl, string, count=0, flags=0)