Commit graph

359 commits

Author SHA1 Message Date
Neal Norwitz
a7edb11122 Whitespace normalization 2006-07-30 06:59:13 +00:00
Neal Norwitz
f71ec5a0ac Bug #1515471: string.replace() accepts character buffers again.
Pass the char* and size around rather than PyObject's.
2006-07-30 06:57:04 +00:00
Neal Norwitz
8e6675a7dc Update doc to make it agree with code.
Bottom factor out some common code.
2006-06-11 05:47:14 +00:00
Georg Brandl
90e27d38f5 Apply perky's fix for #1503157: "/".join([u"", u""]) raising OverflowError.
Also improve error message on overflow.
2006-06-10 06:40:50 +00:00
Georg Brandl
242508160e RFE #1491485: str/unicode.endswith()/startswith() now accept a tuple as first argument. 2006-06-09 18:45:48 +00:00
Neal Norwitz
b16e4e7860 Remove ; at end of macro. There was a compiler recently that warned
about extra semi-colons.  It may have been the HP C compiler.
This file will trigger a bunch of those warnings now.
2006-06-01 05:32:49 +00:00
Fredrik Lundh
80f8e80c15 needforspeed: added Py_MEMCPY macro (currently tuned for Visual C only),
and use it for string copy operations.  this gives a 20% speedup on some
string benchmarks.
2006-05-28 12:06:46 +00:00
Fredrik Lundh
0b7ef46950 needforspeed: stringlib refactoring: use find_slice for stringobject 2006-05-27 15:26:19 +00:00
Fredrik Lundh
c2d29c5a6d needforspeed: replace improvements, changed to Py_LOCAL_INLINE
where appropriate
2006-05-27 14:58:20 +00:00
Andrew Dalke
d49d5c49ba cleanup - removed trailing whitespace 2006-05-27 14:16:40 +00:00
Fredrik Lundh
2d23d5bf2e needforspeed: more stringlib refactoring 2006-05-27 10:05:10 +00:00
Andrew Dalke
7e0a62ea90 Added description of why splitlines doesn't use the prealloc strategy 2006-05-26 22:49:03 +00:00
Andrew Dalke
5132407868 Added limits to the replace code so it does not count all of the matching
patterns in a string, only the number needed by the max limit.
2006-05-26 20:25:22 +00:00
Fredrik Lundh
e6e43c867d needforspeed: stringlib refactoring: use stringlib/find for string find 2006-05-26 19:48:07 +00:00
Fredrik Lundh
58b5e84d52 needforspeed: stringlib refactoring, continued. added count and
find helpers; updated unicodeobject to use stringlib_count
2006-05-26 19:24:53 +00:00
Andrew Dalke
c5da53ba78 substring split now uses /F's fast string matching algorithm.
(If compiled without FAST search support, changed the pre-memcmp test
   to check the last character as well as the first.  This gave a 25%
   speedup for my test case.)

Rewrote the split algorithms so they stop when maxsplit gets to 0.
Previously they did a string match first then checked if the maxsplit
was reached.  The new way prevents a needless string search.
2006-05-26 19:02:09 +00:00
Fredrik Lundh
b3167cbcd7 needforspeed: added rpartition implementation 2006-05-26 18:15:38 +00:00
Fredrik Lundh
3a65d87e8c needforspeed: remove remaining USE_FAST macros; if fastsearch was
broken, someone would have noticed by now ;-)
2006-05-26 17:31:41 +00:00
Fredrik Lundh
c2032fb86a needforspeed: cleanup 2006-05-26 17:26:39 +00:00
Fredrik Lundh
b947948c61 needforspeed: stringlib refactoring (in progress) 2006-05-26 17:22:38 +00:00
Fredrik Lundh
a50d201bd9 needforspeed: stringlib refactoring (in progress) 2006-05-26 17:04:58 +00:00
Fredrik Lundh
7c940d1d68 needforspeed: use Py_LOCAL on a few more locals in stringobject.c 2006-05-26 16:32:42 +00:00
Andrew Dalke
02758d66ce Eeked out another 3% or so performance in split whitespace by cleaning up the algorithm. 2006-05-26 15:21:01 +00:00
Andrew Dalke
525eab3712 Changes to string.split/rsplit on whitespace to preallocate space in the
results list.

Originally it allocated 0 items and used the list growth during append.  Now
it preallocates 12 items so the first few appends don't need list reallocs.

("Here are some words ."*2).split(None, 1) is 7% faster
("Here are some words ."*2).split() is is 15% faster

  (Your milage may vary, see dealership for details.)

File parsing like this

    for line in f:
        count += len(line.split())

is also about 15% faster.  There is a slowdown of about 3% for large
strings because of the additional overhead of checking if the append is
to a preallocated region of the list or not.  This will be the rare case.
It could be improved with special case code but we decided it was not
useful enough.

There is a cost of 12*sizeof(PyObject *) bytes per list.  For the normal
case of file parsing this is not a problem because of the lists have
a short lifetime.  We have not come up with cases where this is a problem
in real life.

I chose 12 because human text averages about 11 words per line in books,
one of my data sets averages 6.2 words with a final peak at 11 words per
line, and I work with a tab delimited data set with 8 tabs per line (or
9 words per line).  12 encompasses all of these.

Also changed the last rstrip code to append then reverse, rather than
doing insert(0).  The strip() and rstrip() times are now comparable.
2006-05-26 14:00:45 +00:00
Fredrik Lundh
95e2a91615 use Py_LOCAL also for string and unicode objects 2006-05-26 11:38:15 +00:00
Fredrik Lundh
f2c0dfdb13 needforspeed: use Py_ssize_t for the fastsearch counter and skip
length (thanks, neal!).  and yes, I've verified that this doesn't
slow things down ;-)
2006-05-26 10:27:17 +00:00
Fredrik Lundh
450277fef5 needforspeed: use METH_O for argument handling, which made partition some
~15% faster for the current tests (which is noticable faster than a corre-
sponding find call).  thanks to neal-who-never-sleeps for the tip.
2006-05-26 09:46:59 +00:00
Fredrik Lundh
06a69dd8ff needforspeed: partition implementation, part two.
feel free to improve the documentation and the docstrings.
2006-05-26 08:54:28 +00:00
Fredrik Lundh
fe5bb7e6d9 needforspeed: partition for 8-bit strings. for some simple tests,
this is on par with a corresponding find, and nearly twice as fast
as split(sep, 1)

full tests, a unicode version, and documentation will follow to-
morrow.
2006-05-25 23:27:53 +00:00
Bob Ippolito
955b64c031 squelch gcc4 darwin/x86 compiler warnings 2006-05-25 20:52:38 +00:00
Fredrik Lundh
554da412a8 needforspeed: use insert+reverse instead of append 2006-05-25 19:19:05 +00:00
Jack Diederich
60cbb3fe49 * eliminate warning by reverting tmp_s type to 'const char*' 2006-05-25 18:47:15 +00:00
Fredrik Lundh
c3434b3834 needforspeed: use fastsearch also for find/index and contains. the
related tests are now about 10x faster.
2006-05-25 18:44:29 +00:00
Andrew Dalke
598710c727 Added overflow test for adding two (very) large strings where the
new string is over max Py_ssize_t.  I have no way to test it on my
box or any box I have access to.  At least it doesn't break anything.
2006-05-25 18:18:39 +00:00
Andrew M. Kuchling
f344c94c85 Comment typo 2006-05-25 18:11:16 +00:00
Fredrik Lundh
af72237abc needforspeed: use "fastsearch" for count. this results in a 3x speedup
for the related stringbench tests.
2006-05-25 17:55:31 +00:00
Andrew Dalke
8c9091074b Fixed problem identified by Georg. The special-case in-place code for replace
made a copy of the string using PyString_FromStringAndSize(s, n) and modify
the copied string in-place.  However, 1 (and 0) character strings are shared
from a cache.  This cause "A".replace("A", "a") to change the cached version
of "A" -- used by everyone.

Now may the copy with NULL as the string and do the memcpy manually.  I've
added regression tests to check if this happens in the future.  Perhaps
there should be a PyString_Copy for this case?
2006-05-25 17:53:00 +00:00
Fredrik Lundh
e68955cf32 needforspeed: new replace implementation by Andrew Dalke. replace is
now about 3x faster on my machine, for the replace tests from string-
bench.
2006-05-25 17:08:14 +00:00
Fredrik Lundh
0c71f88fc9 needforspeed: check for overflow in replace (from Andrew Dalke) 2006-05-25 16:46:54 +00:00
Fredrik Lundh
dfe503d3f0 needforspeed: _toupper/_tolower is a SUSv2 thing; fall back on ISO C
versions if they're not defined.
2006-05-25 16:10:12 +00:00
Fredrik Lundh
4b4e33ef14 needforspeed: make new upper/lower work properly for single-character
strings too... (thanks to georg brandl for spotting the exact problem
faster than anyone else)
2006-05-25 15:49:45 +00:00
Fredrik Lundh
39ccef607e needforspeed: speed up upper and lower for 8-bit string objects.
(the unicode versions of these are still 2x faster on windows,
though...)

based on work by Andrew Dalke, with tweaks by yours truly.
2006-05-25 15:22:03 +00:00
Fredrik Lundh
763b50f9d9 docstring tweaks: count counts non-overlapping substrings, not
total number of occurences
2006-05-22 15:35:12 +00:00
Tim Peters
8931ff1f67 Teach PyString_FromFormat, PyErr_Format, and PyString_FromFormatV
about "%u", "%lu" and "%zu" formats.

Since PyString_FromFormat and PyErr_Format have exactly the same rules
(both inherited from PyString_FromFormatV), it would be good if someone
with more LaTeX Fu changed one of them to just point to the other.
Their docs were way out of synch before this patch, and I just did a
mass copy+paste to repair that.

Not a backport candidate (this is a new feature).
2006-05-13 23:28:20 +00:00
Martin v. Löwis
822f34a848 Revert 43315: Printing of %zd must be signed. 2006-05-13 13:34:04 +00:00
Thomas Wouters
568f1d0eed Py_ssize_t issue; repr()'ing a very large string would result in a teensy
string, because of a cast to int.
2006-04-21 13:54:43 +00:00
Thomas Wouters
dc5f808cbc Make s.replace() work with explicit counts exceeding 2Gb. 2006-04-19 15:38:01 +00:00
Thomas Wouters
4abb3660ca Use Py_ssize_t to hold the 'width' argument to the ljust, rjust, center and
zfill stringmethods, so they can create strings larger than 2Gb on 64bit
systems (even win64.) The unicode versions of these methods already did this
right.
2006-04-19 14:50:15 +00:00
Skip Montanaro
429433b30b C++ compiler cleanup: bunch-o-casts, plus use of unsigned loop index var in a couple places 2006-04-18 00:35:43 +00:00
Neal Norwitz
0e2cbabb8d No need to cast a Py_ssize_t, use %z in PyErr_Format 2006-04-17 05:56:32 +00:00