Commit graph

17 commits

Author SHA1 Message Date
Serhiy Storchaka
ac56f8cc8d
gh-133306: Support \z as a synonym for \Z in regular expressions (GH-133314)
\Z was an error inherited from PCRE 0.95. It was fixed in PCRE 2.0.
In other engines, \Z means not “anchor at string end”, but
“anchor before optional newline at string end”.

\z means “anchor at string end” in most RE engines.
2025-05-03 07:54:33 +00:00
Serhiy Storchaka
f9637b4ba3
Remove dead code in the RE parser (GH-122796) 2024-08-07 19:44:18 +00:00
Serhiy Storchaka
e2b3d831fd
gh-109747: Improve errors for unsupported look-behind patterns (GH-109859)
Now re.error is raised instead of OverflowError or RuntimeError for
too large width of look-behind pattern.

The limit is increased to 2**32-1 (was 2**31-1).
2023-10-14 09:13:02 +03:00
Serhiy Storchaka
ed64204716
gh-106566: Optimize (?!) in regular expressions (GH-106567) 2023-08-07 18:09:56 +03:00
Serhiy Storchaka
74ec02e949
gh-106510: Fix DEBUG output for atomic group (GH-106511) 2023-07-08 14:31:25 +03:00
Nikita Sobolev
67f69dba0a
gh-105687: Remove deprecated objects from re module (#105688) 2023-06-14 12:26:20 +02:00
Serhiy Storchaka
75a6fadf36
gh-91524: Speed up the regular expression substitution (#91525)
Functions re.sub() and re.subn() and corresponding re.Pattern methods
are now 2-3 times faster for replacement strings containing group references.

Closes #91524

Primarily authored by serhiy-storchaka Serhiy Storchaka
Minor-cleanups-by: Gregory P. Smith [Google] <greg@krypto.org>
2022-10-23 15:57:30 -07:00
Miro Hrončok
16a7e4a0b7
gh-92728: Restore re.template, but deprecate it (GH-93161)
Revert "bpo-47211: Remove function re.template() and flag re.TEMPLATE (GH-32300)"

This reverts commit b09184bf05.
2022-05-25 09:05:35 +03:00
Serhiy Storchaka
a84a56d80f
gh-91760: More strict rules for numerical group references and group names in RE (GH-91792)
Only sequence of ASCII digits is now accepted as a numerical reference.
The group name in bytes patterns and replacement strings can now only
contain ASCII letters and digits and underscore.
2022-05-08 19:19:29 +03:00
Serhiy Storchaka
19dca04121
gh-91760: Deprecate group names and numbers which will be invalid in future (GH-91794)
Only sequence of ASCII digits will be accepted as a numerical reference.
The group name in bytes patterns and replacement strings could only
contain ASCII letters and digits and underscore.
2022-04-30 13:13:46 +03:00
Serhiy Storchaka
f703c96cf0
gh-91870: Remove unsupported SRE opcode CALL (GH-91872)
It was initially added to support atomic groups, but that
support was never fully implemented, and CALL was only left
in the compiler, but not interpreter and parser.

ATOMIC_GROUP is now used to support atomic groups.
2022-04-26 21:07:25 +03:00
Serhiy Storchaka
130a8c386b
gh-91308: Simplify parsing inline flag "x" (verbose) (GH-91855) 2022-04-23 12:50:42 +03:00
Serhiy Storchaka
48ec61a89a
gh-91700: Validate the group number in conditional expression in RE (GH-91702)
In expression (?(group)...) an appropriate re.error is now
raised if the group number refers to not defined group.

Previously it raised RuntimeError: invalid SRE code.
2022-04-22 19:53:10 +03:00
Serhiy Storchaka
6ccfa31421
gh-90568: Fix exception type for \N with a named sequence in RE (GH-91665)
re.error is now raised instead of TypeError.
2022-04-22 18:35:28 +03:00
Serhiy Storchaka
50872dbadc
bpo-47227: Suppress expression chaining for more RE parsing errors (GH-32333) 2022-04-06 19:54:44 +03:00
Serhiy Storchaka
b09184bf05
bpo-47211: Remove function re.template() and flag re.TEMPLATE (GH-32300)
They were undocumented and never working.
2022-04-06 19:53:50 +03:00
Serhiy Storchaka
1be3260a90
bpo-47152: Convert the re module into a package (GH-32177)
The sre_* modules are now deprecated.
2022-04-02 11:35:13 +03:00