Commit graph

1465 commits

Author SHA1 Message Date
Victor Stinner
337986740f Issue #26464: Fix unicode_fast_translate() again
Initialize i variable if the string is non-ASCII.
2016-03-01 21:59:58 +01:00
Victor Stinner
3d9d77a3dc Merge 3.5 2016-03-01 21:30:50 +01:00
Victor Stinner
6c9aa8f2bf Fix str.translate()
Issue #26464: Fix str.translate() when string is ASCII and first replacements
removes character, but next replacement uses a non-ASCII character or a string
longer than 1 character. Regression introduced in Python 3.5.0.
2016-03-01 21:30:30 +01:00
Victor Stinner
5b96f17b1c Merge 3.5 2016-01-27 17:01:13 +01:00
Victor Stinner
5bc03a6d4d Fix resize_compact()
Issue #26217: resize_compact() must set wstr_length to 0 after freeing the wstr
string. Otherwise, an assertion fails in _PyUnicode_CheckConsistency().
2016-01-27 16:56:53 +01:00
Serhiy Storchaka
726fc139a5 Issue #20440: More use of Py_SETREF.
This patch is manually crafted and contains changes that couldn't be handled
automatically.
2015-12-27 15:44:33 +02:00
Serhiy Storchaka
191321d11b Issue #20440: More use of Py_SETREF.
This patch is manually crafted and contains changes that couldn't be handled
automatically.
2015-12-27 15:41:34 +02:00
Serhiy Storchaka
ef1585eb9a Issue #25923: Added more const qualifiers to signatures of static and private functions. 2015-12-25 20:01:53 +02:00
Serhiy Storchaka
2d06e84455 Issue #25923: Added the const qualifier to static constant arrays. 2015-12-25 19:53:18 +02:00
Serhiy Storchaka
f006940351 Issue #20440: Massive replacing unsafe attribute setting code with special
macro Py_SETREF.
2015-12-24 10:39:57 +02:00
Serhiy Storchaka
5a57ade58e Issue #20440: Massive replacing unsafe attribute setting code with special
macro Py_SETREF.
2015-12-24 10:35:59 +02:00
Serhiy Storchaka
9b3a2eec1c Issues #25890, #25891, #25892: Removed unused variables in Windows code.
Reported by Alexander Riccio.
2015-12-18 10:03:13 +02:00
Serhiy Storchaka
7c088a9b5c Issue #25709: Fixed problem with in-place string concatenation and utf-8 cache. 2015-12-03 01:05:52 +02:00
Serhiy Storchaka
6648bf5661 Issue #25709: Fixed problem with in-place string concatenation and utf-8 cache. 2015-12-03 01:04:37 +02:00
Serhiy Storchaka
31b9410654 Issue #25709: Fixed problem with in-place string concatenation and utf-8 cache. 2015-12-03 01:02:03 +02:00
Serhiy Storchaka
7aa690860e Issue #25709: Fixed problem with in-place string concatenation and utf-8 cache. 2015-12-03 01:02:03 +02:00
Benjamin Peterson
d798dc1034 merge 3.5 (#25630) 2015-11-15 21:57:50 -08:00
Benjamin Peterson
a4d33b3428 make the PyUnicode_FSConverter cleanup set the decrefed argument to NULL (closes #25630) 2015-11-15 21:57:39 -08:00
Serhiy Storchaka
413fdcea21 Issue #24821: Refactor STRINGLIB(fastsearch_memchr_1char) and split it on
STRINGLIB(find_char) and STRINGLIB(rfind_char) that can be used independedly
without special preconditions.
2015-11-14 15:42:17 +02:00
Serhiy Storchaka
4a7c03aab4 Issue #25523: Merge a-to-an corrections from 3.5. 2015-11-02 14:44:29 +02:00
Serhiy Storchaka
a84f6c3dd3 Issue #25523: Merge a-to-an corrections from 3.4. 2015-11-02 14:39:05 +02:00
Serhiy Storchaka
d65c9496da Issue #25523: Further a-to-an corrections. 2015-11-02 14:10:23 +02:00
Victor Stinner
358af13526 Issue #25353: Optimize unicode escape and raw unicode escape encoders to use
the new _PyBytesWriter API.
2015-10-12 22:36:57 +02:00
Victor Stinner
6c2cdae9e6 Writer APIs: use empty string singletons
Modify _PyBytesWriter_Finish() and _PyUnicodeWriter_Finish() to return the
empty bytes/Unicode string if the string is empty.
2015-10-12 13:29:43 +02:00
Victor Stinner
6bd525b656 Optimize error handlers of ASCII and Latin1 encoders when the replacement
string is pure ASCII: use _PyBytesWriter_WriteBytes(), don't check individual
character.

Cleanup unicode_encode_ucs1():

* Rename repunicode to rep
* Clear rep object on error
* Factorize code between bytes and unicode path
2015-10-09 13:10:05 +02:00
Victor Stinner
ce179bf6ba Add _PyBytesWriter_WriteBytes() to factorize the code 2015-10-09 12:57:22 +02:00
Victor Stinner
ad7715891e _PyBytesWriter: simplify code to avoid "prealloc" parameters
Substract preallocate bytes from min_size before calling
_PyBytesWriter_Prepare().
2015-10-09 12:38:53 +02:00
Victor Stinner
3fa36ff5e4 Issue #25318: Fix backslashreplace()
Fix code to estimate the needed space.
2015-10-09 03:37:11 +02:00
Victor Stinner
797485e101 Issue #25318: Avoid sprintf() in backslashreplace()
Rewrite backslashreplace() to be closer to PyCodec_BackslashReplaceErrors().

Add also unit tests for non-BMP characters.
2015-10-09 03:17:30 +02:00
Victor Stinner
0016507c16 Issue #25318: Move _PyBytesWriter to bytesobject.c
Declare also the private API in bytesobject.h.
2015-10-09 01:53:21 +02:00
Victor Stinner
e7bf86cd7d Optimize backslashreplace error handler
Issue #25318: Optimize backslashreplace and xmlcharrefreplace error handlers in
UTF-8 encoder. Optimize also backslashreplace error handler for ASCII and
Latin1 encoders.

Use the new _PyBytesWriter API to optimize these error handlers for the
encoders. It avoids to create an exception and call the slow implementation of
the error handler.
2015-10-09 01:39:28 +02:00
Victor Stinner
fdfbf78114 Issue #25318: Add _PyBytesWriter API
Add a new private API to optimize Unicode encoders. It uses a small buffer
allocated on the stack and supports overallocation.

Use _PyBytesWriter API for UCS1 (ASCII and Latin1) and UTF-8 encoders. Enable
overallocation for the UTF-8 encoder with error handlers.

unicode_encode_ucs1(): initialize collend to collstart+1 to not check the
current character twice, we already know that it is not ASCII.
2015-10-09 00:33:49 +02:00
Victor Stinner
74e8fac3c8 Issue #25301: Fix compatibility with ISO C90 2015-10-05 13:49:26 +02:00
Victor Stinner
1d65d9192d Issue #25301: The UTF-8 decoder is now up to 15 times as fast for error
handlers: ``ignore``, ``replace`` and ``surrogateescape``.
2015-10-05 13:43:50 +02:00
Victor Stinner
eb36fdaad8 Fix _PyUnicodeWriter_PrepareKind()
Initialize kind to 0 (PyUnicode_WCHAR_KIND) to ensure that
_PyUnicodeWriter_PrepareKind() handles correctly read-only buffer: copy the
buffer.
2015-10-03 01:55:51 +02:00
Serhiy Storchaka
29e68edbf4 Issue #24848: Fixed bugs in UTF-7 decoding of misformed data:
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
3. In some circumstances the '\xfd' character was produced instead of the
replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).
2015-10-02 13:14:03 +03:00
Serhiy Storchaka
58c8f2bb6d Issue #24848: Fixed bugs in UTF-7 decoding of misformed data:
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
3. In some circumstances the '\xfd' character was produced instead of the
replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).
2015-10-02 13:13:14 +03:00
Serhiy Storchaka
28b21e50c8 Issue #24848: Fixed bugs in UTF-7 decoding of misformed data:
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
2015-10-02 13:07:28 +03:00
Victor Stinner
3222da26fe Make _PyUnicode_TranslateCharmap() symbol private
unicodeobject.h exposes PyUnicode_TranslateCharmap() and PyUnicode_Translate().
2015-10-01 22:07:32 +02:00
Victor Stinner
01ada3996b Issue #25267: The UTF-8 encoder is now up to 75 times as fast for error
handlers: ``ignore``, ``replace``, ``surrogateescape``, ``surrogatepass``.
Patch co-written with Serhiy Storchaka.
2015-10-01 21:54:51 +02:00
Victor Stinner
c3713e9706 Optimize ascii/latin1+surrogateescape encoders
Issue #25227: Optimize ASCII and latin1 encoders with the ``surrogateescape``
error handler: the encoders are now up to 3 times as fast.

Initial patch written by Serhiy Storchaka.
2015-09-29 12:32:13 +02:00
Victor Stinner
0030cd52da Issue #25227: Cleanup unicode_encode_ucs1() error handler
* Change limit type from unsigned int to Py_UCS4, to use the same type than the
  "ch" variable (an Unicode character).
* Reuse ch variable for _Py_ERROR_XMLCHARREFREPLACE
* Add some newlines for readability
2015-09-24 14:45:00 +02:00
Victor Stinner
54385b206d Issue #24870: revert unwanted change
Sorry, I pushed the patch on the UTF-8 decoder by mistake :-(
2015-09-22 10:46:52 +02:00
Victor Stinner
5ebae87628 Issue #25207, #14626: Fix my commit.
It doesn't work to use #define XXX defined(YYY)" and then "#ifdef XXX"
to check YYY.
2015-09-22 01:29:33 +02:00
Victor Stinner
6174474bea _PyUnicodeWriter_PrepareInternal(): make the assertion more strict 2015-09-22 01:01:17 +02:00
Victor Stinner
ca9381ea01 Issue #24870: Add _PyUnicodeWriter_PrepareKind() macro
Add a macro which ensures that the writer has at least the requested kind.
2015-09-22 00:58:32 +02:00
Victor Stinner
5014920cb7 Issue #24870: Reuse the new _Py_error_handler enum
Factorize code with the new get_error_handler() function.

Add some empty lines for readability.
2015-09-22 00:26:54 +02:00
Victor Stinner
f96418de05 Issue #24870: Optimize the ASCII decoder for error handlers: surrogateescape,
ignore and replace. Initial patch written by Naoki Inada.

The decoder is now up to 60 times as fast for these error handlers.

Add also unit tests for the ASCII decoder.
2015-09-21 23:06:27 +02:00
Zachary Ware
070bd62cfa Closes #21279: Merge with 3.5 2015-08-06 00:05:13 -05:00
Zachary Ware
d987a81d29 Issue #21279: Merge with 3.4 2015-08-06 00:04:23 -05:00