Commit graph

5516 commits

Author SHA1 Message Date
Victor Stinner
ce179bf6ba Add _PyBytesWriter_WriteBytes() to factorize the code 2015-10-09 12:57:22 +02:00
Victor Stinner
ad7715891e _PyBytesWriter: simplify code to avoid "prealloc" parameters
Substract preallocate bytes from min_size before calling
_PyBytesWriter_Prepare().
2015-10-09 12:38:53 +02:00
Victor Stinner
53926a1ce2 _PyBytesWriter: rename size attribute to min_size 2015-10-09 12:37:03 +02:00
Victor Stinner
fa7762ec06 Issue #25349: Optimize bytes % args using the new private _PyBytesWriter API
* Thanks to the _PyBytesWriter API, output smaller than 512 bytes are allocated
  on the stack and so avoid calling _PyBytes_Resize(). Because of that, change
  the default buffer size to fmtcnt instead of fmtcnt+100.
* Rely on _PyBytesWriter algorithm to overallocate the buffer instead of using
  a custom code. For example, _PyBytesWriter uses a different overallocation
  factor (25% or 50%) depending on the platform to get best performances.
* Disable overallocation for the last write.
* Replace C loops to fill characters with memset()
* Add also many comments to _PyBytes_Format()
* Remove unused FORMATBUFLEN constant
* Avoid the creation of a temporary bytes object when formatting a floating
  point number (when no custom formatting option is used)
* Fix also reference leaks on error handling
* Use Py_MEMCPY() to copy bytes between two formatters (%)
2015-10-09 11:48:06 +02:00
Victor Stinner
b3653a3458 Issue #25318: cleanup code _PyBytesWriter
Rename "stack buffer" to "small buffer".

Add also an assertion in _PyBytesWriter_GetPos().
2015-10-09 03:38:24 +02:00
Victor Stinner
3fa36ff5e4 Issue #25318: Fix backslashreplace()
Fix code to estimate the needed space.
2015-10-09 03:37:11 +02:00
Victor Stinner
797485e101 Issue #25318: Avoid sprintf() in backslashreplace()
Rewrite backslashreplace() to be closer to PyCodec_BackslashReplaceErrors().

Add also unit tests for non-BMP characters.
2015-10-09 03:17:30 +02:00
Victor Stinner
b13b97d3b8 Issue #25318: Fix compilation error
Replace "#if Py_DEBUG" with "#ifdef Py_DEBUG".
2015-10-09 02:52:16 +02:00
Victor Stinner
0016507c16 Issue #25318: Move _PyBytesWriter to bytesobject.c
Declare also the private API in bytesobject.h.
2015-10-09 01:53:21 +02:00
Victor Stinner
e7bf86cd7d Optimize backslashreplace error handler
Issue #25318: Optimize backslashreplace and xmlcharrefreplace error handlers in
UTF-8 encoder. Optimize also backslashreplace error handler for ASCII and
Latin1 encoders.

Use the new _PyBytesWriter API to optimize these error handlers for the
encoders. It avoids to create an exception and call the slow implementation of
the error handler.
2015-10-09 01:39:28 +02:00
Victor Stinner
fdfbf78114 Issue #25318: Add _PyBytesWriter API
Add a new private API to optimize Unicode encoders. It uses a small buffer
allocated on the stack and supports overallocation.

Use _PyBytesWriter API for UCS1 (ASCII and Latin1) and UTF-8 encoders. Enable
overallocation for the UTF-8 encoder with error handlers.

unicode_encode_ucs1(): initialize collend to collstart+1 to not check the
current character twice, we already know that it is not ASCII.
2015-10-09 00:33:49 +02:00
Martin Panter
585a6acfef Merge typo fixes from 3.5 2015-10-07 11:13:55 +00:00
Martin Panter
ec1aa5c2a1 More typos in 3.5 documentation and comments 2015-10-07 11:03:53 +00:00
Martin Panter
3f930dcd87 Merge typo fixes from 3.4 into 3.5 2015-10-07 11:01:47 +00:00
Martin Panter
9955a373a8 Various minor typos in documentation and comments 2015-10-07 10:26:23 +00:00
Benjamin Peterson
cdae2cb88a merge 3.5 (closes #24806) 2015-10-06 19:42:46 -07:00
Benjamin Peterson
59dc696821 merge 3.4 (#24806) 2015-10-06 19:42:02 -07:00
Benjamin Peterson
bd6c41a185 prevent unacceptable bases from becoming bases through multiple inheritance (#24806) 2015-10-06 19:36:54 -07:00
Victor Stinner
74e8fac3c8 Issue #25301: Fix compatibility with ISO C90 2015-10-05 13:49:26 +02:00
Victor Stinner
1d65d9192d Issue #25301: The UTF-8 decoder is now up to 15 times as fast for error
handlers: ``ignore``, ``replace`` and ``surrogateescape``.
2015-10-05 13:43:50 +02:00
Victor Stinner
eb36fdaad8 Fix _PyUnicodeWriter_PrepareKind()
Initialize kind to 0 (PyUnicode_WCHAR_KIND) to ensure that
_PyUnicodeWriter_PrepareKind() handles correctly read-only buffer: copy the
buffer.
2015-10-03 01:55:51 +02:00
Serhiy Storchaka
29e68edbf4 Issue #24848: Fixed bugs in UTF-7 decoding of misformed data:
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
3. In some circumstances the '\xfd' character was produced instead of the
replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).
2015-10-02 13:14:03 +03:00
Serhiy Storchaka
58c8f2bb6d Issue #24848: Fixed bugs in UTF-7 decoding of misformed data:
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
3. In some circumstances the '\xfd' character was produced instead of the
replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).
2015-10-02 13:13:14 +03:00
Serhiy Storchaka
28b21e50c8 Issue #24848: Fixed bugs in UTF-7 decoding of misformed data:
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
2015-10-02 13:07:28 +03:00
Serhiy Storchaka
5dbe245ef2 Issue #24483: C implementation of functools.lru_cache() now calculates key's
hash only once.
2015-10-02 12:47:59 +03:00
Serhiy Storchaka
b9d98d532c Issue #24483: C implementation of functools.lru_cache() now calculates key's
hash only once.
2015-10-02 12:47:11 +03:00
Victor Stinner
3222da26fe Make _PyUnicode_TranslateCharmap() symbol private
unicodeobject.h exposes PyUnicode_TranslateCharmap() and PyUnicode_Translate().
2015-10-01 22:07:32 +02:00
Victor Stinner
01ada3996b Issue #25267: The UTF-8 encoder is now up to 75 times as fast for error
handlers: ``ignore``, ``replace``, ``surrogateescape``, ``surrogatepass``.
Patch co-written with Serhiy Storchaka.
2015-10-01 21:54:51 +02:00
Victor Stinner
d69dd8bd5e (Merge 3.5) Issue #25182: Fix compilation on Windows 2015-09-30 15:03:50 +02:00
Victor Stinner
ae86da9b20 (Merge 3.4) Issue #25182: Fix compilation on Windows 2015-09-30 15:03:31 +02:00
Victor Stinner
89719e1daf Issue #25182: Fix compilation on Windows
Restore also errno value before calling PyErr_SetFromErrno().
2015-09-30 15:01:34 +02:00
Serhiy Storchaka
85c386dee4 Issue #25182: The stdprinter (used as sys.stderr before the io module is
imported at startup) now uses the backslashreplace error handler.
2015-09-30 15:51:01 +03:00
Serhiy Storchaka
008fc77e1e Issue #25182: The stdprinter (used as sys.stderr before the io module is
imported at startup) now uses the backslashreplace error handler.
2015-09-30 15:50:32 +03:00
Serhiy Storchaka
a59018c7ab Issue #25182: The stdprinter (used as sys.stderr before the io module is
imported at startup) now uses the backslashreplace error handler.
2015-09-30 15:46:53 +03:00
Victor Stinner
c3713e9706 Optimize ascii/latin1+surrogateescape encoders
Issue #25227: Optimize ASCII and latin1 encoders with the ``surrogateescape``
error handler: the encoders are now up to 3 times as fast.

Initial patch written by Serhiy Storchaka.
2015-09-29 12:32:13 +02:00
Victor Stinner
0030cd52da Issue #25227: Cleanup unicode_encode_ucs1() error handler
* Change limit type from unsigned int to Py_UCS4, to use the same type than the
  "ch" variable (an Unicode character).
* Reuse ch variable for _Py_ERROR_XMLCHARREFREPLACE
* Add some newlines for readability
2015-09-24 14:45:00 +02:00
Victor Stinner
54385b206d Issue #24870: revert unwanted change
Sorry, I pushed the patch on the UTF-8 decoder by mistake :-(
2015-09-22 10:46:52 +02:00
Victor Stinner
5ebae87628 Issue #25207, #14626: Fix my commit.
It doesn't work to use #define XXX defined(YYY)" and then "#ifdef XXX"
to check YYY.
2015-09-22 01:29:33 +02:00
Victor Stinner
6174474bea _PyUnicodeWriter_PrepareInternal(): make the assertion more strict 2015-09-22 01:01:17 +02:00
Victor Stinner
ca9381ea01 Issue #24870: Add _PyUnicodeWriter_PrepareKind() macro
Add a macro which ensures that the writer has at least the requested kind.
2015-09-22 00:58:32 +02:00
Victor Stinner
5014920cb7 Issue #24870: Reuse the new _Py_error_handler enum
Factorize code with the new get_error_handler() function.

Add some empty lines for readability.
2015-09-22 00:26:54 +02:00
Victor Stinner
f96418de05 Issue #24870: Optimize the ASCII decoder for error handlers: surrogateescape,
ignore and replace. Initial patch written by Naoki Inada.

The decoder is now up to 60 times as fast for these error handlers.

Add also unit tests for the ASCII decoder.
2015-09-21 23:06:27 +02:00
Victor Stinner
026977717e Merge 3.5 2015-09-19 13:39:16 +02:00
Victor Stinner
5783fd2c58 Issue #24999: In longobject.c, use two shifts instead of ">> 2*PyLong_SHIFT" to
avoid undefined behaviour when LONG_MAX type is smaller than 60 bits.

This change should fix a warning with the ICC compiler.
2015-09-19 13:39:03 +02:00
Victor Stinner
058258652a Merge 3.5 (pytime, odict) 2015-09-18 13:55:15 +02:00
Victor Stinner
4a0d1e7c36 odictobject.c: fix compiler warning
PyObject_Length() returns a P_ssize_t, not an int. Use a Py_ssize_t to avoid
overflow.
2015-09-18 13:44:11 +02:00
Serhiy Storchaka
56f6e76c68 Issue #15989: Fixed some scarcely probable integer overflows.
It is very unlikely that they can occur in real code for now.
2015-09-06 21:25:30 +03:00
Guido van Rossum
ba5f59089a Issue #24912: Prevent __class__ assignment to immutable built-in objects. (Merge 3.5 -> 3.6) 2015-09-05 15:20:57 -07:00
Guido van Rossum
37fdcbc4c3 Issue #24912: Prevent __class__ assignment to immutable built-in objects. (Merge 3.5.0 -> 3.5) 2015-09-05 15:20:08 -07:00
Guido van Rossum
7d293ee97d Issue #24912: Prevent __class__ assignment to immutable built-in objects. 2015-09-04 20:54:07 -07:00