Commit graph

249 commits

Author SHA1 Message Date
Miss Islington (bot)
c7e6bfd150
RE: Add more tests for inline flag "x" and re.VERBOSE (GH-91854)
(cherry picked from commit 6b45076bd6)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2022-04-23 03:16:36 -07:00
Serhiy Storchaka
1748816e80
[3.10] gh-91575: Update case-insensitive matching in re to the latest Unicode version (GH-91580). (GH-91661)
(cherry picked from commit 1c2fcebf3c)
2022-04-22 21:44:05 +03:00
Serhiy Storchaka
080781cd49
[3.10] gh-91700: Validate the group number in conditional expression in RE (GH-91702) (GH-91831)
In expression (?(group)...) an appropriate re.error is now
raised if the group number refers to not defined group.

Previously it raised RuntimeError: invalid SRE code.
(cherry picked from commit 48ec61a89a)
2022-04-22 21:09:30 +03:00
Serhiy Storchaka
9c18d783c3
[3.10] gh-90568: Fix exception type for \N with a named sequence in RE (GH-91665) (GH-91830)
re.error is now raised instead of TypeError.
(cherry picked from commit 6ccfa31421)
2022-04-22 21:08:49 +03:00
Miss Islington (bot)
c213cccc9b
Add more tests for group names and refs in RE (GH-91695)
(cherry picked from commit 74070085da)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2022-04-19 07:33:09 -07:00
Miss Islington (bot)
906f1a4a95
bpo-39394: Improve warning message in the re module (GH-31988)
A warning about inline flags not at the start of the regular
expression now contains the position of the flag.
(cherry picked from commit 4142961b9f)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2022-03-19 07:09:45 -07:00
Ethan Furman
9bf7c2d638
[3.10] bpo-44559: [Enum] revert enum module to 3.9 (GH-27010)
* [Enum] revert enum module to 3.9
2021-07-03 21:08:42 -07:00
Erlend Egeberg Aasland
0a3452e7cf
[3.10] bpo-43988: Add test.support.check_disallow_instantiation() (GH-25757) (GH-26885)
(cherry picked from commit 4f725261c6, fbff5387c3, and 8cec740820)

Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@innova.no>

Automerge-Triggered-By: GH:vstinner
2021-06-23 16:46:25 -07:00
Erlend Egeberg Aasland
9746cda705
bpo-43916: Apply Py_TPFLAGS_DISALLOW_INSTANTIATION to selected types (GH-25748)
Apply Py_TPFLAGS_DISALLOW_INSTANTIATION to the following types:

* _dbm.dbm
* _gdbm.gdbm
* _multibytecodec.MultibyteCodec
* _sre..SRE_Scanner
* _thread._localdummy
* _thread.lock
* _winapi.Overlapped
* array.arrayiterator
* functools.KeyWrapper
* functools._lru_list_elem
* pyexpat.xmlparser
* re.Match
* re.Pattern
* unicodedata.UCD
* zlib.Compress
* zlib.Decompress
2021-04-30 16:04:57 +02:00
Erlend Egeberg Aasland
5daf70b22e
bpo-43908: Make re types immutable (GH-25697)
Co-authored-by: Victor Stinner <vstinner@python.org>
2021-04-29 08:47:11 +02:00
Ethan Furman
7aaeb2a3d6
bpo-38250: [Enum] single-bit flags are canonical (GH-24215)
Flag members are now divided by one-bit verses multi-bit, with multi-bit being treated as aliases. Iterating over a flag only returns the contained single-bit flags.

Iterating, repr(), and str() show members in definition order.

When constructing combined-member flags, any extra integer values are either discarded (CONFORM), turned into ints (EJECT) or treated as errors (STRICT). Flag classes can specify which of those three behaviors is desired:

>>> class Test(Flag, boundary=CONFORM):
...     ONE = 1
...     TWO = 2
...
>>> Test(5)
<Test.ONE: 1>

Besides the three above behaviors, there is also KEEP, which should not be used unless necessary -- for example, _convert_ specifies KEEP as there are flag sets in the stdlib that are incomplete and/or inconsistent (e.g. ssl.Options). KEEP will, as the name suggests, keep all bits; however, iterating over a flag with extra bits will only return the canonical flags contained, not the extra bits.

Iteration is now in member definition order.  If member definition order
matches increasing value order, then a more efficient method of flag
decomposition is used; otherwise, sort() is called on the results of
that method to get definition order.


``re`` module:

repr() has been modified to support as closely as possible its previous
output; the big difference is that inverted flags cannot be output as
before because the inversion operation now always returns the comparable
positive result; i.e.

   re.A|re.I|re.M|re.S is ~(re.L|re.U|re.S|re.T|re.DEBUG)

in both of the above terms, the ``value`` is 282.

re's tests have been updated to reflect the modifications to repr().
2021-01-25 14:26:19 -08:00
Erlend Egeberg Aasland
a6109ef68d
bpo-1635741: Convert _sre types to heap types and establish module state (PEP 384) (GH-23393) 2020-11-20 21:36:23 +09:00
Victor Stinner
57572b103e
bpo-40443: Remove unused imports in tests (GH-19805) 2020-04-30 01:48:37 +02:00
Serhiy Storchaka
14a0e16c88
bpo-36548: Improve the repr of re flags. (GH-12715) 2019-05-31 10:39:47 +03:00
Max Bernstein
ccb7ca728e bpo-36929: Modify io/re tests to allow for missing mod name (#13392)
* bpo-36929: Modify io/re tests to allow for missing mod name

For a vanishingly small number of internal types, CPython sets the
tp_name slot to mod_name.type_name, either in the PyTypeObject or the
PyType_Spec. There are a few minor places where this surfaces:

* Custom repr functions for those types (some of which ignore the
  tp_name in favor of using a string literal, such as _io.TextIOWrapper)
* Pickling error messages

The test suite only tests the former. This commit modifies the test
suite to allow Python implementations to omit the module prefix.

https://bugs.python.org/issue36929
2019-05-21 10:09:21 -07:00
Victor Stinner
ab71f8b793
bpo-29571: Fix test_re.test_locale_flag() (GH-12099)
Use locale.getpreferredencoding() rather than locale.getlocale() to
get the locale encoding. With some locales, locale.getlocale()
returns the wrong encoding.

For example, on Fedora 29, locale.getlocale() returns ISO-8859-1
encoding for the "en_IN" locale, whereas
locale.getpreferredencoding() reports the correct encoding: UTF-8.
2019-03-01 00:08:03 +01:00
animalize
4a7f44a2ed bpo-34294: re module, fix wrong capturing groups in rare cases. (GH-11546)
Need to reset capturing groups between two SRE(match) callings in loops, this fixes wrong capturing groups in rare cases.

Also add a missing index in re.rst.
2019-02-18 15:26:37 +02:00
Serhiy Storchaka
a445feb729
bpo-30688: Support \N{name} escapes in re patterns. (GH-5588)
Co-authored-by: Jonathan Eunice <jonathan.eunice@gmail.com>
2018-02-10 00:08:17 +02:00
Serhiy Storchaka
fbb490fd2f
bpo-32308: Replace empty matches adjacent to a previous non-empty match in re.sub(). (#4846) 2018-01-04 11:06:13 +02:00
Serhiy Storchaka
b748e3b258
Fix improper use of re.escape() in tests. (#4814) 2017-12-12 19:21:50 +02:00
Serhiy Storchaka
70d56fb525
bpo-25054, bpo-1647489: Added support of splitting on zerowidth patterns. (#4471)
Also fixed searching patterns that could match an empty string.
2017-12-04 14:29:05 +02:00
Serhiy Storchaka
05cb728d68
bpo-30349: Raise FutureWarning for nested sets and set operations (#1553)
in regular expressions.
2017-11-16 12:38:26 +02:00
Serhiy Storchaka
3557b05c5a bpo-31690: Allow the inline flags "a", "L", and "u" to be used as group flags for RE. (#3885) 2017-10-24 23:31:42 +03:00
Serhiy Storchaka
0b5e61ddca bpo-30397: Add re.Pattern and re.Match. (#1646) 2017-10-04 20:09:49 +03:00
Serhiy Storchaka
5075416b8f bpo-30978: str.format_map() now passes key lookup exceptions through. (#2790)
Previously any exception was replaced with a KeyError exception.
2017-08-03 11:45:23 +03:00
Roy Williams
171b9a354e bpo-30605: Fix compiling binary regexs with BytesWarnings enabled. (#2016)
Running our unit tests with `-bb` enabled triggered this failure.
2017-06-10 08:01:16 +03:00
Serhiy Storchaka
c7ac7280c3 bpo-30375: Correct the stacklevel of regex compiling warnings. (#1595)
Warnings emitted when compile a regular expression now always point
to the line in the user code.  Previously they could point into inners
of the re module if emitted from inside of groups or conditionals.
2017-05-16 15:16:15 +03:00
Serhiy Storchaka
4ab6abfca4 bpo-30299: Display a bytecode when compile a regex in debug mode. (#1491)
`re.compile(..., re.DEBUG)` now displays the compiled bytecode in
human readable form.
2017-05-14 09:05:13 +03:00
Serhiy Storchaka
821a9d146b bpo-30340: Enhanced regular expressions optimization. (#1542)
This increased the performance of matching some patterns up to 25 times.
2017-05-14 08:32:33 +03:00
Serhiy Storchaka
305ccbe27e bpo-30298: Weaken the condition of deprecation warnings for inline modifiers. (#1490)
Now allowed several subsequential inline modifiers at the start of the
pattern (e.g. '(?i)(?s)...').  In verbose mode whitespaces and comments
now are allowed before and between inline modifiers (e.g.
'(?x) (?i) (?s)...').
2017-05-10 06:05:20 +03:00
Serhiy Storchaka
6d336a0279 bpo-30285: Optimize case-insensitive matching and searching (#1482)
of regular expressions.
2017-05-09 23:37:14 +03:00
Serhiy Storchaka
7186cc29be bpo-30277: Replace _sre.getlower() with _sre.ascii_tolower() and _sre.unicode_tolower(). (#1468) 2017-05-05 10:42:46 +03:00
Serhiy Storchaka
898ff03e1e bpo-30215: Make re.compile() locale agnostic. (#1361)
Compiled regular expression objects with the re.LOCALE flag no longer
depend on the locale at compile time.  Only the locale at matching
time affects the result of matching.
2017-05-05 08:53:40 +03:00
Serhiy Storchaka
fdbd01151d bpo-10076: Compiled regular expression and match objects now are copyable. (#1000) 2017-04-16 10:16:03 +03:00
Serhiy Storchaka
5908300e4b bpo-29995: re.escape() now escapes only special characters. (#1007) 2017-04-13 21:06:43 +03:00
Victor Stinner
d6debb24e0 bpo-29919: Remove unused imports found by pyflakes (#137)
Make also minor PEP8 coding style fixes on modified imports.
2017-03-27 16:05:26 +02:00
Benjamin Peterson
21a74312f2 Revert "bpo-29571: Use correct locale encoding in test_re (#149)" (#554)
This reverts commit ace5c0fdd9.
2017-03-07 22:48:09 -08:00
Benjamin Peterson
1e68716fd5 Revert "make the locale_flag fallback code work again (#375)" (#387)
This reverts commit 43f5df5bfa.
2017-03-01 21:53:00 -08:00
Benjamin Peterson
43f5df5bfa make the locale_flag fallback code work again (#375) 2017-02-28 23:59:12 -08:00
Nick Coghlan
ace5c0fdd9 bpo-29571: Use correct locale encoding in test_re (#149)
``local.getlocale(locale.LC_CTYPE)`` and
``locale.getpreferredencoding(False)`` may give different answers
in some cases (such as the ``en_IN`` locale).

``re.LOCALE`` uses the latter, so update the test case to match.
2017-02-18 15:01:22 +05:30
Serhiy Storchaka
ef5176769d Issue #29444: Fixed out-of-bounds buffer access in the group() method of
the match object.  Based on patch by WGH.
2017-02-04 22:57:44 +02:00
Serhiy Storchaka
86e42376c2 Issue #29444: Fixed out-of-bounds buffer access in the group() method of
the match object.  Based on patch by WGH.
2017-02-04 22:55:40 +02:00
Serhiy Storchaka
7e10dbbd45 Issue #29444: Fixed out-of-bounds buffer access in the group() method of
the match object.  Based on patch by WGH.
2017-02-04 22:53:57 +02:00
Serhiy Storchaka
70d28a184c Remove unused imports. 2016-12-16 20:00:15 +02:00
Victor Stinner
726a57d45f Issue #28765: _sre.compile() now checks the type of groupindex and indexgroup
groupindex must a dictionary and indexgroup must be a tuple.

Previously, indexgroup was a list. Use a tuple to reduce the memory usage.
2016-11-22 23:04:39 +01:00
Serhiy Storchaka
53c53ea4c5 Issue #27030: Unknown escapes in re.sub() replacement template are allowed
again.  But they still are deprecated and will be disabled in 3.7.
2016-12-06 19:15:29 +02:00
Victor Stinner
bcf4dccfa7 Issue #28727: Optimize pattern_richcompare() for a==a
A pattern is equal to itself.
2016-11-22 15:30:38 +01:00
Victor Stinner
b44fb128ae Implement rich comparison for _sre.SRE_Pattern
Issue #28727: Regular expression patterns, _sre.SRE_Pattern objects created by
re.compile(), become comparable (only x==y and x!=y operators). This change
should fix the issue #18383: don't duplicate warning filters when the warnings
module is reloaded (thing usually only done in unit tests).
2016-11-21 16:35:08 +01:00
Victor Stinner
8bf43e6d0b Issue #28082: Add basic unit tests on re enums 2016-11-14 12:38:43 +01:00
Serhiy Storchaka
662cef66d7 Issue #25953: re.sub() now raises an error for invalid numerical group
reference in replacement template even if the pattern is not found in
the string.  Error message for invalid group reference now includes the
group index and the position of the reference.
Based on patch by SilentGhost.
2016-10-23 12:11:19 +03:00