Victor Stinner
ad7715891e
_PyBytesWriter: simplify code to avoid "prealloc" parameters
...
Substract preallocate bytes from min_size before calling
_PyBytesWriter_Prepare().
2015-10-09 12:38:53 +02:00
Victor Stinner
3fa36ff5e4
Issue #25318 : Fix backslashreplace()
...
Fix code to estimate the needed space.
2015-10-09 03:37:11 +02:00
Victor Stinner
797485e101
Issue #25318 : Avoid sprintf() in backslashreplace()
...
Rewrite backslashreplace() to be closer to PyCodec_BackslashReplaceErrors().
Add also unit tests for non-BMP characters.
2015-10-09 03:17:30 +02:00
Victor Stinner
0016507c16
Issue #25318 : Move _PyBytesWriter to bytesobject.c
...
Declare also the private API in bytesobject.h.
2015-10-09 01:53:21 +02:00
Victor Stinner
e7bf86cd7d
Optimize backslashreplace error handler
...
Issue #25318 : Optimize backslashreplace and xmlcharrefreplace error handlers in
UTF-8 encoder. Optimize also backslashreplace error handler for ASCII and
Latin1 encoders.
Use the new _PyBytesWriter API to optimize these error handlers for the
encoders. It avoids to create an exception and call the slow implementation of
the error handler.
2015-10-09 01:39:28 +02:00
Victor Stinner
fdfbf78114
Issue #25318 : Add _PyBytesWriter API
...
Add a new private API to optimize Unicode encoders. It uses a small buffer
allocated on the stack and supports overallocation.
Use _PyBytesWriter API for UCS1 (ASCII and Latin1) and UTF-8 encoders. Enable
overallocation for the UTF-8 encoder with error handlers.
unicode_encode_ucs1(): initialize collend to collstart+1 to not check the
current character twice, we already know that it is not ASCII.
2015-10-09 00:33:49 +02:00
Victor Stinner
74e8fac3c8
Issue #25301 : Fix compatibility with ISO C90
2015-10-05 13:49:26 +02:00
Victor Stinner
1d65d9192d
Issue #25301 : The UTF-8 decoder is now up to 15 times as fast for error
...
handlers: ``ignore``, ``replace`` and ``surrogateescape``.
2015-10-05 13:43:50 +02:00
Victor Stinner
eb36fdaad8
Fix _PyUnicodeWriter_PrepareKind()
...
Initialize kind to 0 (PyUnicode_WCHAR_KIND) to ensure that
_PyUnicodeWriter_PrepareKind() handles correctly read-only buffer: copy the
buffer.
2015-10-03 01:55:51 +02:00
Serhiy Storchaka
29e68edbf4
Issue #24848 : Fixed bugs in UTF-7 decoding of misformed data:
...
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
3. In some circumstances the '\xfd' character was produced instead of the
replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).
2015-10-02 13:14:03 +03:00
Serhiy Storchaka
58c8f2bb6d
Issue #24848 : Fixed bugs in UTF-7 decoding of misformed data:
...
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
3. In some circumstances the '\xfd' character was produced instead of the
replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).
2015-10-02 13:13:14 +03:00
Serhiy Storchaka
28b21e50c8
Issue #24848 : Fixed bugs in UTF-7 decoding of misformed data:
...
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
2015-10-02 13:07:28 +03:00
Victor Stinner
3222da26fe
Make _PyUnicode_TranslateCharmap() symbol private
...
unicodeobject.h exposes PyUnicode_TranslateCharmap() and PyUnicode_Translate().
2015-10-01 22:07:32 +02:00
Victor Stinner
01ada3996b
Issue #25267 : The UTF-8 encoder is now up to 75 times as fast for error
...
handlers: ``ignore``, ``replace``, ``surrogateescape``, ``surrogatepass``.
Patch co-written with Serhiy Storchaka.
2015-10-01 21:54:51 +02:00
Victor Stinner
c3713e9706
Optimize ascii/latin1+surrogateescape encoders
...
Issue #25227 : Optimize ASCII and latin1 encoders with the ``surrogateescape``
error handler: the encoders are now up to 3 times as fast.
Initial patch written by Serhiy Storchaka.
2015-09-29 12:32:13 +02:00
Victor Stinner
0030cd52da
Issue #25227 : Cleanup unicode_encode_ucs1() error handler
...
* Change limit type from unsigned int to Py_UCS4, to use the same type than the
"ch" variable (an Unicode character).
* Reuse ch variable for _Py_ERROR_XMLCHARREFREPLACE
* Add some newlines for readability
2015-09-24 14:45:00 +02:00
Victor Stinner
54385b206d
Issue #24870 : revert unwanted change
...
Sorry, I pushed the patch on the UTF-8 decoder by mistake :-(
2015-09-22 10:46:52 +02:00
Victor Stinner
5ebae87628
Issue #25207 , #14626 : Fix my commit.
...
It doesn't work to use #define XXX defined(YYY)" and then "#ifdef XXX"
to check YYY.
2015-09-22 01:29:33 +02:00
Victor Stinner
6174474bea
_PyUnicodeWriter_PrepareInternal(): make the assertion more strict
2015-09-22 01:01:17 +02:00
Victor Stinner
ca9381ea01
Issue #24870 : Add _PyUnicodeWriter_PrepareKind() macro
...
Add a macro which ensures that the writer has at least the requested kind.
2015-09-22 00:58:32 +02:00
Victor Stinner
5014920cb7
Issue #24870 : Reuse the new _Py_error_handler enum
...
Factorize code with the new get_error_handler() function.
Add some empty lines for readability.
2015-09-22 00:26:54 +02:00
Victor Stinner
f96418de05
Issue #24870 : Optimize the ASCII decoder for error handlers: surrogateescape,
...
ignore and replace. Initial patch written by Naoki Inada.
The decoder is now up to 60 times as fast for these error handlers.
Add also unit tests for the ASCII decoder.
2015-09-21 23:06:27 +02:00
Zachary Ware
070bd62cfa
Closes #21279 : Merge with 3.5
2015-08-06 00:05:13 -05:00
Zachary Ware
d987a81d29
Issue #21279 : Merge with 3.4
2015-08-06 00:04:23 -05:00
Zachary Ware
79b98df023
Issue #21279 : Flesh out str.translate docs
...
Initial patch by Kinga Farkas, Martin Panter, and John Posner.
2015-08-05 23:54:15 -05:00
Raymond Hettinger
ac2ef65c32
Make the unicode equality test an external function rather than in-lining it.
...
The real benefit of the unicode specialized function comes from
bypassing the overhead of PyObject_RichCompareBool() and not
from being in-lined (especially since there was almost no shared
data between the caller and callee). Also, the in-lining was
having a negative effect on code generation for the callee.
2015-07-04 16:04:44 -07:00
Serhiy Storchaka
d4ea03c785
Issue #24284 : The startswith and endswith methods of the str class no longer
...
return True when finding the empty string and the indexes are completely out
of range.
2015-05-31 09:15:51 +03:00
Antoine Pitrou
873e0df946
Fix some compilation warnings when using gcc (-Wmaybe-uninitialized).
2015-05-19 21:06:04 +02:00
Antoine Pitrou
f6d1f1fa8a
Fix some compilation warnings when using gcc (-Wmaybe-uninitialized).
2015-05-19 21:04:33 +02:00
Serhiy Storchaka
0d4df752ac
Issue #15027 : The UTF-32 encoder is now 3x to 7x faster.
2015-05-12 23:12:45 +03:00
Serhiy Storchaka
7e9d1d1a1b
Issue #23908 : os functions now reject paths with embedded null character
...
on Windows instead of silently truncate them.
Removed no longer used _PyUnicode_HasNULChars().
2015-04-20 10:12:28 +03:00
Serhiy Storchaka
1009bf18b3
Issue #23501 : Argumen Clinic now generates code into separate files by default.
2015-04-03 23:53:51 +03:00
Victor Stinner
1912b39def
_PyUnicodeWriter_WriteStr() now checks that the input string is consistent
...
in debug mode to detect bugs earlier.
_PyUnicodeWriter_Finish() doesn't check if the read only string is consistent,
whereas it does check consistency for strings built by itself.
2015-03-26 09:37:23 +01:00
Serhiy Storchaka
d9d769fcdd
Issue #23573 : Increased performance of string search operations (str.find,
...
str.index, str.count, the in operator, str.split, str.partition) with
arguments of different kinds (UCS1, UCS2, UCS4).
2015-03-24 21:55:47 +02:00
Victor Stinner
f50e187724
Fix compiler warnings: comparison between signed and unsigned numbers
2015-03-20 11:32:24 +01:00
Victor Stinner
0c39b1b970
Initialize variables to prevent GCC warnings
2015-03-18 15:02:06 +01:00
Benjamin Peterson
e5a853c390
use PyMem_NEW to detect overflow ( closes #23362 )
2015-03-02 13:23:25 -05:00
Steve Dower
3e96f324dc
Issue #23451 : Update pyconfig.h for Windows to require Vista headers and remove unnecessary version checks.
2015-03-02 08:01:10 -08:00
Serhiy Storchaka
78a8249127
Issue #23490 : Fixed possible crashes related to interoperability between
...
old-style and new API for string with 2**30-1 characters.
2015-02-20 21:34:39 +02:00
Serhiy Storchaka
e55181f517
Issue #23490 : Fixed possible crashes related to interoperability between
...
old-style and new API for string with 2**30-1 characters.
2015-02-20 21:34:06 +02:00
Serhiy Storchaka
4d0d982985
Issue #23446 : Use PyMem_New instead of PyMem_Malloc to avoid possible integer
...
overflows. Added few missed PyErr_NoMemory().
2015-02-16 13:33:32 +02:00
Serhiy Storchaka
1a1ff29659
Issue #23446 : Use PyMem_New instead of PyMem_Malloc to avoid possible integer
...
overflows. Added few missed PyErr_NoMemory().
2015-02-16 13:28:22 +02:00
Serhiy Storchaka
4dbc305002
Issue #23055 : Fixed a buffer overflow in PyUnicode_FromFormatV. Analysis
...
and fix by Guido Vranken.
2015-01-27 22:18:46 +02:00
Victor Stinner
29dacf2e97
Issue #15859 : PyUnicode_EncodeFSDefault(), PyUnicode_EncodeMBCS() and
...
PyUnicode_EncodeCodePage() now raise an exception if the object is not an
Unicode object. For PyUnicode_EncodeFSDefault(), it was already the case on
platforms other than Windows. Patch written by Campbell Barton.
2015-01-26 16:41:32 +01:00
Serhiy Storchaka
bbd3aa8ece
Issue #23321 : Fixed a crash in str.decode() when error handler returned
...
replacment string longer than mailformed input data.
2015-01-26 01:24:31 +02:00
Serhiy Storchaka
7e4b9057b3
Issue #23321 : Fixed a crash in str.decode() when error handler returned
...
replacment string longer than mailformed input data.
2015-01-26 01:22:54 +02:00
Ethan Furman
b95b56150f
Issue20284: Implement PEP461
2015-01-23 20:05:18 -08:00
Serhiy Storchaka
82e07b92b3
Issue #23181 : More "codepoint" -> "code point".
2015-01-18 11:33:31 +02:00
Serhiy Storchaka
d3faf43f9b
Issue #23181 : More "codepoint" -> "code point".
2015-01-18 11:28:37 +02:00
Serhiy Storchaka
b757c83ec6
Issue #22581 : Use more "bytes-like object" throughout the docs and comments.
2014-12-05 22:25:22 +02:00