Commit graph

302 commits

Author SHA1 Message Date
Victor Stinner
ce179bf6ba Add _PyBytesWriter_WriteBytes() to factorize the code 2015-10-09 12:57:22 +02:00
Victor Stinner
ad7715891e _PyBytesWriter: simplify code to avoid "prealloc" parameters
Substract preallocate bytes from min_size before calling
_PyBytesWriter_Prepare().
2015-10-09 12:38:53 +02:00
Victor Stinner
e7bf86cd7d Optimize backslashreplace error handler
Issue #25318: Optimize backslashreplace and xmlcharrefreplace error handlers in
UTF-8 encoder. Optimize also backslashreplace error handler for ASCII and
Latin1 encoders.

Use the new _PyBytesWriter API to optimize these error handlers for the
encoders. It avoids to create an exception and call the slow implementation of
the error handler.
2015-10-09 01:39:28 +02:00
Victor Stinner
fdfbf78114 Issue #25318: Add _PyBytesWriter API
Add a new private API to optimize Unicode encoders. It uses a small buffer
allocated on the stack and supports overallocation.

Use _PyBytesWriter API for UCS1 (ASCII and Latin1) and UTF-8 encoders. Enable
overallocation for the UTF-8 encoder with error handlers.

unicode_encode_ucs1(): initialize collend to collstart+1 to not check the
current character twice, we already know that it is not ASCII.
2015-10-09 00:33:49 +02:00
Victor Stinner
01ada3996b Issue #25267: The UTF-8 encoder is now up to 75 times as fast for error
handlers: ``ignore``, ``replace``, ``surrogateescape``, ``surrogatepass``.
Patch co-written with Serhiy Storchaka.
2015-10-01 21:54:51 +02:00
Eric V. Smith
ab2aa6dc91 Fixed an incorrect comment. 2015-08-26 14:10:32 -04:00
Serhiy Storchaka
9ce71a6475 Fixed typos in comments. 2015-05-18 22:20:18 +03:00
Serhiy Storchaka
7e29eea926 Fixed typos in comments. 2015-05-18 22:19:42 +03:00
Serhiy Storchaka
0d4df752ac Issue #15027: The UTF-32 encoder is now 3x to 7x faster. 2015-05-12 23:12:45 +03:00
Serhiy Storchaka
d9d769fcdd Issue #23573: Increased performance of string search operations (str.find,
str.index, str.count, the in operator, str.split, str.partition) with
arguments of different kinds (UCS1, UCS2, UCS4).
2015-03-24 21:55:47 +02:00
Serhiy Storchaka
009b811d67 Removed unintentional trailing spaces in non-external and non-generated C files. 2015-03-18 21:53:15 +02:00
Serhiy Storchaka
4fdb68491e Issue #22896: Avoid to use PyObject_AsCharBuffer(), PyObject_AsReadBuffer()
and PyObject_AsWriteBuffer().
2015-02-03 01:21:08 +02:00
Serhiy Storchaka
b757c83ec6 Issue #22581: Use more "bytes-like object" throughout the docs and comments. 2014-12-05 22:25:22 +02:00
Benjamin Peterson
1cc9520327 s/stringobject/bytesobject/ (closes #22036)
Patch by Martin Matusiak.
2014-07-23 21:39:37 -07:00
Benjamin Peterson
d455ce4fd4 merge 3.3 2014-03-30 19:52:39 -04:00
Benjamin Peterson
0ad6098b67 merge 3.2 2014-03-30 19:52:22 -04:00
Benjamin Peterson
23cf403ca1 fix expandtabs overflow detection to be consistent and not rely on signed overflow 2014-03-30 19:47:57 -04:00
Serhiy Storchaka
3079328d29 Reverted changeset b72c5573c5e7 (issue #15027). 2014-01-04 22:44:01 +02:00
Serhiy Storchaka
583a93943c Issue #15027: Rewrite the UTF-32 encoder. It is now 1.6x to 3.5x faster. 2014-01-04 19:25:37 +02:00
Benjamin Peterson
0ee22bf774 fix format spec recursive expansion (closes #19729) 2013-11-26 19:22:36 -06:00
Serhiy Storchaka
dc2fd5101a Remove dead code committed in issue #12892. 2013-11-19 15:56:05 +02:00
Serhiy Storchaka
58cf607d13 Issue #12892: The utf-16* and utf-32* codecs now reject (lone) surrogates.
The utf-16* and utf-32* encoders no longer allow surrogate code points
(U+D800-U+DFFF) to be encoded.
The utf-32* decoders no longer decode byte sequences that correspond to
surrogate code points.
The surrogatepass error handler now works with the utf-16* and utf-32* codecs.

Based on patches by Victor Stinner and Kang-Hao (Kenny) Lu.
2013-11-19 11:32:41 +02:00
Ezio Melotti
745d54d2fa #17806: Added keyword-argument support for "tabsize" to str/bytes.expandtabs(). 2013-11-16 19:10:57 +02:00
Victor Stinner
cc64eb5b9f Issue #18408: Fix bytearrayiter.partition()/rpartition(), handle
PyByteArray_FromStringAndSize() failure (ex: on memory allocation failure)
2013-10-29 03:15:37 +01:00
Serhiy Storchaka
8fa8ee3970 Issue #18701: Remove support of old CPython versions (<3.0) from C code. 2013-08-17 00:48:02 +03:00
Raymond Hettinger
d06eeb4a24 merge 2013-08-13 18:20:55 -07:00
Raymond Hettinger
b1b915c796 Issue 18719: Remove a false optimization
Remove an unused early-out test from the critical path for
dict and set lookups.

When the strings already have matching lengths, kinds, and hashes,
there is no additional information gained by checking the first
characters (the probability of a mismatch is already known to
be less than 1 in 2**64).
2013-08-13 18:16:34 -07:00
Antoine Pitrou
9ed5f27266 Issue #18722: Remove uses of the "register" keyword in C code. 2013-08-13 20:18:52 +02:00
Benjamin Peterson
d2b58a9880 only recursively expand in the format spec (closes #17644) 2013-05-17 17:34:30 -05:00
Benjamin Peterson
4d94474ba3 rewrite the parsing of field names to be more consistent wrt recursive expansion 2013-05-17 18:22:31 -05:00
Benjamin Peterson
48953632df merge 3.3 2013-05-17 17:35:28 -05:00
Ezio Melotti
5263c13801 Merge removal of trailing whitespace from 3.3. 2013-04-21 04:08:18 +03:00
Ezio Melotti
6b02772c13 Remove trailing whitespace. 2013-04-21 04:07:51 +03:00
Victor Stinner
8f674ccd64 Close #17694: Add minimum length to _PyUnicodeWriter
* Add also min_char attribute to _PyUnicodeWriter structure (currently unused)
 * _PyUnicodeWriter_Init() has no more argument (except the writer itself):
   min_length and overallocate must be set explicitly
 * In error handlers, only enable overallocation if the replacement string
   is longer than 1 character
 * CJK decoders don't use overallocation anymore
 * Set min_length, instead of preallocating memory using
   _PyUnicodeWriter_Prepare(), in many decoders
 * _PyUnicode_DecodeUnicodeInternal() checks for integer overflow
2013-04-17 23:02:17 +02:00
Victor Stinner
76b3b2726c stringlib: remove unused STRINGLIB_RESIZE macro 2013-04-14 16:29:09 +02:00
Serhiy Storchaka
e2cef885a2 Issue #16061: Speed up str.replace() for replacing 1-character strings. 2013-04-13 22:45:04 +03:00
Victor Stinner
7efa3b8242 Close #13126: "Simplify" FASTSEARCH() code to help the compiler to emit more
efficient machine code. Patch written by Antoine Pitrou.

Without this change, str.find() was 10% slower than str.rfind() in the worst
case.
2013-04-08 00:26:43 +02:00
Victor Stinner
cfc4c13b04 Add _PyUnicodeWriter_WriteSubstring() function
Write a function to enable more optimizations:

 * If the substring is the whole string and overallocation is disabled, just
   keep a reference to the string, don't copy characters
 * Avoid a call to the expensive _PyUnicode_FindMaxChar() function when
   possible
2013-04-03 01:48:39 +02:00
Serhiy Storchaka
06b16f879f Remove unused defines. 2013-02-23 14:49:09 +02:00
Serhiy Storchaka
18809fa94e Remove unused defines. 2013-02-23 14:48:16 +02:00
Antoine Pitrou
4de7457009 Issue #17173: Remove uses of locale-dependent C functions (isalpha() etc.) in the interpreter.
I've left a couple of them in: zlib (third-party lib), getaddrinfo.c
(doesn't include Python.h, and probably obsolete), _sre.c (legitimate
use for the re.LOCALE flag).
2013-02-09 23:11:27 +01:00
Serhiy Storchaka
b946af5897 Check for NULL before the pointer aligning in fastsearch_memchr_1char.
There is no guarantee that NULL is aligned.
2013-01-15 13:32:41 +02:00
Serhiy Storchaka
18ba40b945 Check for NULL before the pointer aligning in fastsearch_memchr_1char.
There is no guarantee that NULL is aligned.
2013-01-15 13:27:28 +02:00
Christian Heimes
5f7e8dab11 Issue #16592: stringlib_bytes_join doesn't raise MemoryError on allocation failure 2012-12-02 07:56:42 +01:00
Victor Stinner
6caa6fb535 (Merge 3.3) Issue #8271: Fix compilation on Windows 2012-11-05 00:00:50 +01:00
Victor Stinner
ab60de478d Issue #8271: Fix compilation on Windows 2012-11-04 23:59:15 +01:00
Ezio Melotti
cfa9636404 #8271: merge with 3.3. 2012-11-04 23:23:09 +02:00
Ezio Melotti
f7ed5d111b #8271: the utf-8 decoder now outputs the correct number of U+FFFD characters when used with the "replace" error handler on invalid utf-8 sequences. Patch by Serhiy Storchaka, tests by Ezio Melotti. 2012-11-04 23:21:38 +02:00
Antoine Pitrou
6f7b0da6bc Issue #12805: Make bytes.join and bytearray.join faster when the separator is empty.
Patch by Serhiy Storchaka.
2012-10-20 23:08:34 +02:00
Christian Heimes
743e0cd6b5 Issue #16166: Add PY_LITTLE_ENDIAN and PY_BIG_ENDIAN macros and unified
endianess detection and handling.
2012-10-17 23:52:17 +02:00