cpython

mirror of https://github.com/python/cpython.git synced 2025-07-12 22:05:16 +00:00

Author	SHA1	Message	Date
Miss Islington (bot)	1c26f1ce6c	[3.11] gh-109747: Improve errors for unsupported look-behind patterns (GH-109859) (GH-110860) Now re.error is raised instead of OverflowError or RuntimeError for too large width of look-behind pattern. The limit is increased to 232-1 (was 231-1). (cherry picked from commit `e2b3d831fd`) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>	2023-10-14 06:51:24 +00:00
Miss Islington (bot)	7fefed091a	[3.11] gh-110590: Fix a bug where _sre.compile would overwrite exceptions (GH-110591) (#110614 ) TypeError would be overwritten by OverflowError if 'code' param contained non-ints. (cherry picked from commit `344d3a222a`) Co-authored-by: Nikita Sobolev <mail@sobolevn.me>	2023-10-10 10:48:07 +00:00
Serhiy Storchaka	26137e2cf7	[3.11] gh-100061: Proper fix of the bug in the matching of possessive quantifiers (GH-102612) (GH-108004) Restore the global Input Stream pointer after trying to match a sub-pattern. Co-authored-by: Ma Lin <animalize@users.noreply.github.com> (cherry picked from commit `abd9cc52d9`) Co-authored-by: SKO <41810398+uyw4687@users.noreply.github.com>	2023-08-16 08:36:36 +00:00
Serhiy Storchaka	5b76eaf02e	[3.11] gh-106052: Fix bug in the matching of possessive quantifiers (GH-106515) (GH-107795) It did not work in the case of a subpattern containing backtracking. Temporary implement possessive quantifiers as equivalent greedy qualifiers in atomic groups. (cherry picked from commit `7b6e34e5ba`)	2023-08-09 06:15:27 +00:00
Miss Islington (bot)	769b7d2d0b	[3.11] Move implementation specific RE tests to separate class (GH-106563) (GH-106565) (cherry picked from commit `8cb6f9761e`) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>	2023-07-09 15:23:27 +03:00
Miss Islington (bot)	2d037fb406	[3.11] gh-106510: Fix DEBUG output for atomic group (GH-106511) (GH-106549) (cherry picked from commit `74ec02e949`)	2023-07-08 15:15:22 +03:00
Miss Islington (bot)	eb023a84d9	gh-98740: Fix validation of conditional expressions in RE (GH-98764) In very rare circumstances the JUMP opcode could be confused with the argument of the opcode in the "then" part which doesn't end with the JUMP opcode. This led to incorrect detection of the final JUMP opcode and incorrect calculation of the size of the subexpression. NOTE: Changed return value of functions _validate_inner() and _validate_charset() in Modules/_sre/sre.c. Now they return 0 on success, -1 on failure, and 1 if the last op is JUMP (which usually is a failure). Previously they returned 1 on success and 0 on failure. (cherry picked from commit `e9ac890c02`) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>	2022-11-03 00:47:39 -07:00
Miss Islington (bot)	1bd1e379de	gh-94675: Add a regression test for rjsmin re slowdown (GH-94685) Adds a regression test for an re slowdown observed by rjsmin. Uses multiprocessing to kill the test after SHORT_TIMEOUT. Co-authored-by: Oleg Iarygin <dralife@yandex.ru> Co-authored-by: Christian Heimes <christian@python.org> (cherry picked from commit `fe23c0061d`) Co-authored-by: Miro Hrončok <miro@hroncok.cz>	2022-08-03 16:45:19 -07:00
Miss Islington (bot)	029835d9d4	gh-91404: Revert "bpo-23689: re module, fix memory leak when a match is terminated by a signal or allocation failure (GH-32283) (GH-93882) Revert "bpo-23689: re module, fix memory leak when a match is terminated by a signal or memory allocation failure (GH-32283)" This reverts commit `6e3eee5c11`. Manual fixups to increase the MAGIC number and to handle conflicts with a couple of changes that landed after that. Thanks for reviews by Ma Lin and Serhiy Storchaka. (cherry picked from commit `4beee0c7b0`) Co-authored-by: Gregory P. Smith <greg@krypto.org>	2022-06-17 01:43:56 -07:00
Miss Islington (bot)	74b205b3eb	gh-92728: Restore re.template, but deprecate it (GH-93161) Revert "bpo-47211: Remove function re.template() and flag re.TEMPLATE (GH-32300)" This reverts commit `b09184bf05`. (cherry picked from commit `16a7e4a0b7`) Co-authored-by: Miro Hrončok <miro@hroncok.cz>	2022-05-24 23:32:20 -07:00
Christian Heimes	8f937976bc	[3.11] gh-90473: Skip tests that don't apply to Emscripten and WASI (GH-92846) (GH-92851) Co-authored-by: Christian Heimes <christian@python.org>	2022-05-16 20:15:56 +02:00
Serhiy Storchaka	19dca04121	gh-91760: Deprecate group names and numbers which will be invalid in future (GH-91794) Only sequence of ASCII digits will be accepted as a numerical reference. The group name in bytes patterns and replacement strings could only contain ASCII letters and digits and underscore.	2022-04-30 13:13:46 +03:00
Serhiy Storchaka	090721721b	Simplify testing the warning filename (GH-91868) The context manager result has the "filename" attribute.	2022-04-24 10:23:59 +03:00
Serhiy Storchaka	6b45076bd6	RE: Add more tests for inline flag "x" and re.VERBOSE (GH-91854)	2022-04-23 12:49:06 +03:00
Serhiy Storchaka	48ec61a89a	gh-91700: Validate the group number in conditional expression in RE (GH-91702) In expression (?(group)...) an appropriate re.error is now raised if the group number refers to not defined group. Previously it raised RuntimeError: invalid SRE code.	2022-04-22 19:53:10 +03:00
Serhiy Storchaka	6ccfa31421	gh-90568: Fix exception type for \N with a named sequence in RE (GH-91665) re.error is now raised instead of TypeError.	2022-04-22 18:35:28 +03:00
Ma Lin	e4e8895ae3	gh-91616: re module, fix .fullmatch() mismatch when using Atomic Grouping or Possessive Quantifiers (GH-91681) These jumps should use DO_JUMP0() instead of DO_JUMP(): - JUMP_POSS_REPEAT_1 - JUMP_POSS_REPEAT_2 - JUMP_ATOMIC_GROUP	2022-04-19 17:49:36 +03:00
Serhiy Storchaka	74070085da	Add more tests for group names and refs in RE (GH-91695)	2022-04-19 16:56:51 +03:00
Serhiy Storchaka	1c2fcebf3c	gh-91575: Update case-insensitive matching in re to the latest Unicode version (GH-91580)	2022-04-18 12:26:30 +03:00
Serhiy Storchaka	b09184bf05	bpo-47211: Remove function re.template() and flag re.TEMPLATE (GH-32300) They were undocumented and never working.	2022-04-06 19:53:50 +03:00
Ma Lin	6e3eee5c11	bpo-23689: re module, fix memory leak when a match is terminated by a signal or memory allocation failure (GH-32283)	2022-04-03 19:16:20 +03:00
Serhiy Storchaka	1be3260a90	bpo-47152: Convert the re module into a package (GH-32177) The sre_* modules are now deprecated.	2022-04-02 11:35:13 +03:00
Ma Lin	356997cccc	bpo-35859: Fix a few long-standing bugs in re engine (GH-12427) In rare cases, capturing group could get wrong result. Regular expression engines in Perl and Java have similar bugs. The new behavior now matches the behavior of more modern RE engines: in the regex module and in PHP, Ruby and Node.js.	2022-03-29 17:31:01 +03:00
Serhiy Storchaka	492d4109f4	bpo-42885: Optimize search for regular expressions starting with "\A" or "^" (GH-32021) Affected functions are re.search(), re.split(), re.findall(), re.finditer() and re.sub().	2022-03-22 17:27:55 +02:00
Serhiy Storchaka	c6cd3cc93c	bpo-47081: Replace "qualifiers" with "quantifiers" in the re module documentation (GH-32028) It is a more commonly used term.	2022-03-22 11:44:47 +02:00
Serhiy Storchaka	345b390ed6	bpo-433030: Add support of atomic grouping in regular expressions (GH-31982) * Atomic grouping: (?>...). * Possessive quantifiers: x++, x+, x?+, x{m,n}+. Equivalent to (?>x+), (?>x), (?>x?), (?>x{m,n}). Co-authored-by: Jeffrey C. Jacobs <timehorse@users.sourceforge.net>	2022-03-21 18:28:22 +02:00
Serhiy Storchaka	92a6abf72e	bpo-47066: Convert a warning about flags not at the start of the regular expression into error (GH-31994)	2022-03-19 16:10:44 +02:00
Serhiy Storchaka	4142961b9f	bpo-39394: Improve warning message in the re module (GH-31988) A warning about inline flags not at the start of the regular expression now contains the position of the flag.	2022-03-19 14:13:31 +02:00
Christian Heimes	ef1327e3b6	bpo-40280: Skip more tests on Emscripten (GH-31947) - lchmod, lchown are not fully implemented - skip umask tests - cannot fstat unlinked or renamed files yet - ignore musl libc issues that affect Emscripten	2022-03-17 12:09:57 +01:00
Erlend Egeberg Aasland	fbff5387c3	bpo-43988: Use check disallow instantiation helper (GH-26392)	2021-05-27 08:43:52 +02:00
Zackery Spytz	6cc8ac9499	bpo-40736: Improve the error message for re.search() TypeError (GH-23312) Include the invalid type in the error message.	2021-05-21 22:02:42 +01:00
Erlend Egeberg Aasland	9746cda705	bpo-43916: Apply Py_TPFLAGS_DISALLOW_INSTANTIATION to selected types (GH-25748) Apply Py_TPFLAGS_DISALLOW_INSTANTIATION to the following types: * _dbm.dbm * _gdbm.gdbm * _multibytecodec.MultibyteCodec * _sre..SRE_Scanner * _thread._localdummy * _thread.lock * _winapi.Overlapped * array.arrayiterator * functools.KeyWrapper * functools._lru_list_elem * pyexpat.xmlparser * re.Match * re.Pattern * unicodedata.UCD * zlib.Compress * zlib.Decompress	2021-04-30 16:04:57 +02:00
Erlend Egeberg Aasland	5daf70b22e	bpo-43908: Make re types immutable (GH-25697) Co-authored-by: Victor Stinner <vstinner@python.org>	2021-04-29 08:47:11 +02:00
Ethan Furman	7aaeb2a3d6	bpo-38250: [Enum] single-bit flags are canonical (GH-24215) Flag members are now divided by one-bit verses multi-bit, with multi-bit being treated as aliases. Iterating over a flag only returns the contained single-bit flags. Iterating, repr(), and str() show members in definition order. When constructing combined-member flags, any extra integer values are either discarded (CONFORM), turned into ints (EJECT) or treated as errors (STRICT). Flag classes can specify which of those three behaviors is desired: >>> class Test(Flag, boundary=CONFORM): ... ONE = 1 ... TWO = 2 ... >>> Test(5) <Test.ONE: 1> Besides the three above behaviors, there is also KEEP, which should not be used unless necessary -- for example, _convert_ specifies KEEP as there are flag sets in the stdlib that are incomplete and/or inconsistent (e.g. ssl.Options). KEEP will, as the name suggests, keep all bits; however, iterating over a flag with extra bits will only return the canonical flags contained, not the extra bits. Iteration is now in member definition order. If member definition order matches increasing value order, then a more efficient method of flag decomposition is used; otherwise, sort() is called on the results of that method to get definition order. ``re`` module: repr() has been modified to support as closely as possible its previous output; the big difference is that inverted flags cannot be output as before because the inversion operation now always returns the comparable positive result; i.e. re.A\|re.I\|re.M\|re.S is ~(re.L\|re.U\|re.S\|re.T\|re.DEBUG) in both of the above terms, the ``value`` is 282. re's tests have been updated to reflect the modifications to repr().	2021-01-25 14:26:19 -08:00
Erlend Egeberg Aasland	a6109ef68d	bpo-1635741: Convert _sre types to heap types and establish module state (PEP 384) (GH-23393)	2020-11-20 21:36:23 +09:00
Victor Stinner	57572b103e	bpo-40443: Remove unused imports in tests (GH-19805)	2020-04-30 01:48:37 +02:00
Serhiy Storchaka	14a0e16c88	bpo-36548: Improve the repr of re flags. (GH-12715)	2019-05-31 10:39:47 +03:00
Max Bernstein	ccb7ca728e	bpo-36929: Modify io/re tests to allow for missing mod name (#13392 ) * bpo-36929: Modify io/re tests to allow for missing mod name For a vanishingly small number of internal types, CPython sets the tp_name slot to mod_name.type_name, either in the PyTypeObject or the PyType_Spec. There are a few minor places where this surfaces: * Custom repr functions for those types (some of which ignore the tp_name in favor of using a string literal, such as _io.TextIOWrapper) * Pickling error messages The test suite only tests the former. This commit modifies the test suite to allow Python implementations to omit the module prefix. https://bugs.python.org/issue36929	2019-05-21 10:09:21 -07:00
Victor Stinner	ab71f8b793	bpo-29571: Fix test_re.test_locale_flag() (GH-12099) Use locale.getpreferredencoding() rather than locale.getlocale() to get the locale encoding. With some locales, locale.getlocale() returns the wrong encoding. For example, on Fedora 29, locale.getlocale() returns ISO-8859-1 encoding for the "en_IN" locale, whereas locale.getpreferredencoding() reports the correct encoding: UTF-8.	2019-03-01 00:08:03 +01:00
animalize	4a7f44a2ed	bpo-34294: re module, fix wrong capturing groups in rare cases. (GH-11546) Need to reset capturing groups between two SRE(match) callings in loops, this fixes wrong capturing groups in rare cases. Also add a missing index in re.rst.	2019-02-18 15:26:37 +02:00
Serhiy Storchaka	a445feb729	bpo-30688: Support \N{name} escapes in re patterns. (GH-5588) Co-authored-by: Jonathan Eunice <jonathan.eunice@gmail.com>	2018-02-10 00:08:17 +02:00
Serhiy Storchaka	fbb490fd2f	bpo-32308: Replace empty matches adjacent to a previous non-empty match in re.sub(). (#4846 )	2018-01-04 11:06:13 +02:00
Serhiy Storchaka	b748e3b258	Fix improper use of re.escape() in tests. (#4814 )	2017-12-12 19:21:50 +02:00
Serhiy Storchaka	70d56fb525	bpo-25054, bpo-1647489: Added support of splitting on zerowidth patterns. (#4471 ) Also fixed searching patterns that could match an empty string.	2017-12-04 14:29:05 +02:00
Serhiy Storchaka	05cb728d68	bpo-30349: Raise FutureWarning for nested sets and set operations (#1553 ) in regular expressions.	2017-11-16 12:38:26 +02:00
Serhiy Storchaka	3557b05c5a	bpo-31690: Allow the inline flags "a", "L", and "u" to be used as group flags for RE. (#3885 )	2017-10-24 23:31:42 +03:00
Serhiy Storchaka	0b5e61ddca	bpo-30397: Add re.Pattern and re.Match. (#1646 )	2017-10-04 20:09:49 +03:00
Serhiy Storchaka	5075416b8f	bpo-30978: str.format_map() now passes key lookup exceptions through. (#2790 ) Previously any exception was replaced with a KeyError exception.	2017-08-03 11:45:23 +03:00
Roy Williams	171b9a354e	bpo-30605: Fix compiling binary regexs with BytesWarnings enabled. (#2016 ) Running our unit tests with `-bb` enabled triggered this failure.	2017-06-10 08:01:16 +03:00
Serhiy Storchaka	c7ac7280c3	bpo-30375: Correct the stacklevel of regex compiling warnings. (#1595 ) Warnings emitted when compile a regular expression now always point to the line in the user code. Previously they could point into inners of the re module if emitted from inside of groups or conditionals.	2017-05-16 15:16:15 +03:00

1 2 3 4 5 ...

272 commits