Commit graph

127 commits

Author SHA1 Message Date
Christian Heimes
e93237dfcc #1629: Renamed Py_Size, Py_Type and Py_Refcnt to Py_SIZE, Py_TYPE and Py_REFCNT. Macros for b/w compatibility are available. 2007-12-19 02:37:44 +00:00
Guido van Rossum
1ff91d95a2 Patch # 1140 (my code, approved by Effbot).
Make sure the type of the return value of re.sub(x, y, z) is the type
of y+x (i.e. unicode if either is unicode, str if they are both str)
even if there are no substitutions or if x==z (which triggered various
special cases in join_list()).

Could be backported to 2.5; no need to port to 3.0.
2007-09-10 22:02:25 +00:00
Martin v. Löwis
6819210b9e PEP 3123: Provide forward compatibility with Python 3.0, while keeping
backwards compatibility. Add Py_Refcnt, Py_Type, Py_Size, and
PyVarObject_HEAD_INIT.
2007-07-21 06:55:02 +00:00
Andrew M. Kuchling
36126c424a Cause a PyObject_Malloc() failure to trigger a MemoryError, and then
add 'if (PyErr_Occurred())' checks to various places so that NULL is
returned properly.

2.4 backport candidate.
2006-10-04 13:42:43 +00:00
Neal Norwitz
ef0de023db Try to handle a malloc failure. I'm not entirely sure this is correct.
There might be something else we need to do to handle the exception.

Klocwork # 212-213
2006-08-12 01:53:28 +00:00
Neal Norwitz
a6d80faf6c Impl ssize_t 2006-06-12 03:05:40 +00:00
Georg Brandl
96a8c3954c Make use of METH_O and METH_NOARGS where possible.
Use Py_UnpackTuple instead of PyArg_ParseTuple where possible.
2006-05-29 21:04:52 +00:00
Georg Brandl
964f5978dc METH_NOARGS functions do get called with two args. 2006-05-28 22:38:57 +00:00
Georg Brandl
fbef5888e7 Fix C function calling conventions in _sre module. 2006-05-28 22:14:04 +00:00
Jack Diederich
2d40077b4f needforspeed: use PyObject_MALLOC instead of system malloc for small
allocations.  Use PyMem_MALLOC for larger (1k+) chunks.  1%-2% speedup.
2006-05-27 15:44:34 +00:00
Skip Montanaro
816a162265 C++ compiler cleanup: proper casts 2006-04-18 11:53:09 +00:00
Anthony Baxter
aefd8ca701 Move constructors, add some casts to make C++ compiler happy. Still a problem
with the getstring() results in pattern_subx. Will come back to that.
2006-04-12 04:26:11 +00:00
Neal Norwitz
94a9c09e10 Rename sre.py -> re.py 2006-03-16 06:30:02 +00:00
Neal Norwitz
60da31660c Thanks to Coverity, these were all reported by their Prevent tool.
All of these (except _lsprof.c) should be backported.  Particularly
the hotshot change which validates sys.path.  Can someone backport?
2006-03-07 04:48:24 +00:00
Martin v. Löwis
15e62742fa Revert backwards-incompatible const changes. 2006-02-27 16:46:16 +00:00
Tim Peters
3d56350910 _compile(): raise an exception if downcasting to SRE_CODE
loses information:

    OverflowError: regular expression code size limit exceeded

Otherwise the compiled code is gibberish, possibly leading at
least to wrong results or (as reported on c.l.py) internal
sre errors at match time.

I'm not sure how to test this.  SRE_CODE is a 2-byte type on
my box, and it's easy to create a regexp that causes the new
exception to trigger here.  But it may be a 4-byte type on
other boxes, and creating a regexp large enough to trigger
problems there would be pretty crazy.

Bugfix candidate.
2006-01-21 02:47:53 +00:00
Neal Norwitz
1ac754fa10 Check return result from Py_InitModule*(). This API can fail.
Probably should be backported.
2006-01-19 06:09:39 +00:00
Jeremy Hylton
af68c874a6 Add const to several API functions that take char *.
In C++, it's an error to pass a string literal to a char* function
without a const_cast().  Rather than require every C++ extension
module to put a cast around string literals, fix the API to state the
const-ness.

I focused on parts of the API where people usually pass literals:
PyArg_ParseTuple() and friends, Py_BuildValue(), PyMethodDef, the type
slots, etc.  Predictably, there were a large set of functions that
needed to be fixed as a result of these changes.  The most pervasive
change was to make the keyword args list passed to
PyArg_ParseTupleAndKewords() to be a const char *kwlist[].

One cast was required as a result of the changes:  A type object
mallocs the memory for its tp_doc slot and later frees it.
PyTypeObject says that tp_doc is const char *; but if the type was
created by type_new(), we know it is safe to cast to char *.
2005-12-10 18:50:16 +00:00
Gustavo Niemeyer
166878f544 Fixing bug #1072259 in SRE. 2004-12-02 16:15:39 +00:00
Raymond Hettinger
9447874131 Add docstrings for regular expression objects and methods. 2004-09-24 04:31:19 +00:00
Gustavo Niemeyer
0506c64086 Fixing bug #817234, which made SRE get into an infinite loop on
empty final matches with finditer(). New test cases included
for this bug and for #581080.
2004-09-03 18:11:59 +00:00
Nicholas Bastin
9ba301e589 Moved SunPro warning suppression into pyport.h and out of individual
modules and objects.
2004-07-15 15:54:05 +00:00
Nicholas Bastin
1ce9e4cfc1 Fixed end-of-loop code not reached warning when using SunPro C 2004-06-17 18:27:18 +00:00
Raymond Hettinger
027bb633b6 Add weakref support to sockets and re pattern objects. 2004-05-31 03:09:25 +00:00
Gustavo Niemeyer
601b963be0 - Fixing annoying warnings. 2004-02-14 00:31:13 +00:00
Gustavo Niemeyer
2cbdc2a461 Cleaning up recursive pieces left in the reorganization. 2003-12-13 20:32:08 +00:00
Gustavo Niemeyer
0f0c06a5c2 Removing dead code. 2003-10-18 20:54:44 +00:00
Gustavo Niemeyer
ad3fc44ccb Implemented non-recursive SRE matching. 2003-10-17 22:13:16 +00:00
Raymond Hettinger
8ae4689657 Simplify and speedup uses of Py_BuildValue():
* Py_BuildValue("(OOO)",a,b,c)  -->  PyTuple_Pack(3,a,b,c)
* Py_BuildValue("()",a)         -->  PyTuple_New(0)
* Py_BuildValue("O", a)         -->  Py_INCREF(a)
2003-10-12 19:09:37 +00:00
Gustavo Niemeyer
28b5bb33ea Fixing bug described in patch #756032, where SRE reads invalid data
due to a corrupted end pointer.
2003-06-26 14:41:08 +00:00
Andrew MacIntyre
1a44448b24 Changes to sre.c after the application of patch #726869 have increased
stack usage on FreeBSD, requiring the recursion limit to be lowered
further.  Building with gcc 2.95 (the standard compiler on FreeBSD 4.x)
is now also affected.

The underlying issue is that FreeBSD's pthreads implementation has a
hard-coded 1MB stack size for the initial (or "primary") thread, which
can not be changed without rebuilding libc_r.  Exhausting this stack
results in a bus error.

Building without pthreads (configure --without-threads), or linking
with the port of the Linux pthreads library (aka Linuxthreads) instead
of libc_r, avoids this limitation.

On OS/2, only gcc 3.2 is affected and the stack size is controllable,
so the special handling has been removed.
2003-06-09 08:22:11 +00:00
Andrew M. Kuchling
c24fe36c57 Allow _sre.c to compile with Python 2.2 2003-04-30 13:09:08 +00:00
Gustavo Niemeyer
caf1c9dfe7 - Included detailed documentation in _sre.c explaining how, when, and why
to use LASTMARK_SAVE()/LASTMARK_RESTORE(), based on the discussion
  in patch #712900.

- Cleaned up LASTMARK_SAVE()/LASTMARK_RESTORE() usage, based on the
  established rules.

- Moved the upper part of the just commited patch (relative to bug #725106)
  to outside the for() loop of BRANCH OP. There's no need to mark_save()
  in every loop iteration.
2003-04-27 14:42:54 +00:00
Gustavo Niemeyer
3646ab98af Fix for part of the problem mentioned in #725149 by Greg Chapman.
This problem is related to a wrong behavior from mark_save/restore(),
which don't restore the mark_stack_base before restoring the marks.
Greg's suggestion was to change the asserts, which happen to be
the only recursive ops that can continue the loop, but the problem would
happen to any operation with the same behavior. So, rather than
hardcoding this into asserts, I have changed mark_save/restore() to
always restore the stackbase before restoring the marks.

Both solutions should fix these two cases, presented by Greg:

>>> re.match('(a)(?:(?=(b)*)c)*', 'abb').groups()
('b', None)
>>> re.match('(a)((?!(b)*))*', 'abb').groups()
('b', None, None)

The rest of the bug and patch in #725149 must be discussed further.
2003-04-27 13:25:21 +00:00
Gustavo Niemeyer
c34f2555bd Applied patch #725106, by Greg Chapman, fixing capturing groups
within repeats of alternatives. The only change to the original
patch was to convert the tests to the new test_re.py file.

This patch fixes cases like:

>>> re.match('((a)|b)*', 'abc').groups()
('b', '')

Which is wrong (it's impossible to match the empty string),
and incompatible with other regex systems, like the following
examples show:

% perl -e '"abc" =~ /^((a)|b)*/; print "$1 $2\n";'
b a

% echo "abc" | sed -r -e "s/^((a)|b)*/\1 \2|/"
b a|c
2003-04-27 12:34:14 +00:00
Gustavo Niemeyer
c23fb77477 Applying patch #726869 by Andrew I MacIntyre, reducing in _sre.c the
recursion limit for certain setups of FreeBSD and OS/2.
2003-04-27 06:58:54 +00:00
Gustavo Niemeyer
3c9068bbec Made MAX_UNTIL/MIN_UNTIL code more coherent about mark protection,
accordingly to further discussions with Greg Chapman in patch #712900.
2003-04-22 15:39:09 +00:00
Gustavo Niemeyer
be733ee7fb More work on bug #672491 and patch #712900.
I've applied a modified version of Greg Chapman's patch. I've included
the fixes without introducing the reorganization mentioned, for the sake
of stability. Also, the second fix mentioned in the patch don't fix the
mentioned problem anymore, because of the change introduced by patch
#720991 (by Greg as well). The new fix wasn't complicated though, and is
included as well.

As a note. It seems that there are other places that require the
"protection" of LASTMARK_SAVE()/LASTMARK_RESTORE(), and are just waiting
for someone to find how to break them. Particularly, I belive that every
recursion of SRE_MATCH() should be protected by these macros. I won't
do that right now since I'm not completely sure about this, and we don't
have much time for testing until the next release.
2003-04-20 07:35:44 +00:00
Gustavo Niemeyer
1aca359e89 - Fixed bug #672491. This change restores the behavior of lastindex/lastgroup
to be compliant with previous python versions, by backing out the changes
  made in revision 2.84 which affected this. The bugfix for backtracking is
  still maintained.
2003-04-20 00:45:13 +00:00
Martin v. Löwis
78e2f06cc6 Fully support 32-bit codes. Enable BIGCHARSET in UCS-4 builds. 2003-04-19 12:56:08 +00:00
Guido van Rossum
41c99e7f96 SF patch #720991 by Gary Herron:
A small fix for bug #545855 and Greg Chapman's
addition of op code SRE_OP_MIN_REPEAT_ONE for
eliminating recursion on simple uses of pattern '*?' on a
long string.
2003-04-14 17:59:34 +00:00
Fredrik Lundh
09705f0b89 fix for SF #635398 (don't "downcast" return strings from unicode to ascii) 2002-11-22 12:46:35 +00:00
Neal Norwitz
addfe0c09c Make private functions static so we don't pollute the namespace 2002-11-10 14:33:26 +00:00
Gustavo Niemeyer
c523b04b0f Fixed sre bug "[#581080] Provoking infinite scanner loops".
This bug happened because: 1) the scanner_search and scanner_match methods
were not checking the buffer limits before increasing the current pointer;
and 2) SRE_SEARCH was using "if (ptr == end)" as a loop break, instead of
"if (ptr >= end)".

* Modules/_sre.c
  (SRE_SEARCH): Check for "ptr >= end" to break loops, so that we don't
  hang forever if a pointer passing the buffer limit is used.
  (scanner_search,scanner_match): Don't increment the current pointer
  if we're going to pass the buffer limit.

* Misc/NEWS
  Mention the fix.
2002-11-07 03:28:56 +00:00
Gustavo Niemeyer
4e7be06a65 Fixed bug #470582, using a modified version of patch #527371,
from Greg Chapman.

* Modules/_sre.c
  (lastmark_restore): New function, implementing algorithm to restore
  a state to a given lastmark. In addition to the similar algorithm used
  in a few places of SRE_MATCH, restore lastindex when restoring lastmark.
  (SRE_MATCH): Replace lastmark inline restoring by lastmark_restore(),
  function. Also include it where missing. In SRE_OP_MARK, set lastindex
  only if i > lastmark.

* Lib/test/re_tests.py
* Lib/test/test_sre.py
  Included regression tests for the fixed bugs.

* Misc/NEWS
  Mention fixes.
2002-11-06 14:06:53 +00:00
Michael W. Hudson
b6a4505123 Cray fixup as seen in bug #558153. 2002-07-31 09:54:24 +00:00
Mark Hammond
8235ea1c3a Land Patch [ 566100 ] Rationalize DL_IMPORT and DL_EXPORT. 2002-07-19 06:55:41 +00:00
Jeremy Hylton
938ace69a0 staticforward bites the dust.
The staticforward define was needed to support certain broken C
compilers (notably SCO ODT 3.0, perhaps early AIX as well) botched the
static keyword when it was used with a forward declaration of a static
initialized structure.  Standard C allows the forward declaration with
static, and we've decided to stop catering to broken C compilers.  (In
fact, we expect that the compilers are all fixed eight years later.)

I'm leaving staticforward and statichere defined in object.h as
static.  This is only for backwards compatibility with C extensions
that might still use it.

XXX I haven't updated the documentation.
2002-07-17 16:30:39 +00:00
Neal Norwitz
35fc7606f0 SF #561244 Micro optimizations
Convert loops to memset()s.
2002-06-13 21:11:11 +00:00
Neal Norwitz
bb2769f580 Revert use of METH_OLDARGS (use 0) to support 1.5.2 2002-03-31 15:46:00 +00:00