Commit graph

1397 commits

Author SHA1 Message Date
Gregory P. Smith
c2176e46d7 Fix the internals of our hash functions to used unsigned values during hash
computation as the overflow behavior of signed integers is undefined.

NOTE: This change is smaller compared to 3.2 as much of this cleanup had
already been done.  I added the comment that my change in 3.2 added so that the
code would match up.  Otherwise this just adds or synchronizes appropriate UL
designations on some constants to be pedantic.

In practice we require compiling everything with -fwrapv which forces overflow
to be defined as twos compliment but this keeps the code cleaner for checkers
or in the case where someone has compiled it without -fwrapv or their
compiler's equivalent.

Found by Clang trunk's Undefined Behavior Sanitizer (UBSan).

Cleanup only - no functionality or hash values change.
2012-12-10 18:32:53 -08:00
Gregory P. Smith
27cbcd6241 Fix the internals of our hash functions to used unsigned values during hash
computation as the overflow behavior of signed integers is undefined.

In practice we require compiling everything with -fwrapv which forces overflow
to be defined as twos compliment but this keeps the code cleaner for checkers
or in the case where someone has compiled it without -fwrapv or their
compiler's equivalent.

Found by Clang trunk's Undefined Behavior Sanitizer (UBSan).

Cleanup only - no functionality or hash values change.
2012-12-10 18:15:46 -08:00
Victor Stinner
8dbd421b4d Cleanup unicodeobject.c
* Remove micro-optization:
   (errors == "surrogateescape" || strcmp(errors, "surrogateescape") == 0).
   Only use strcmp()
 * Initialize 'arg' members in unicode_format_arg() to help the compiler to
   diagnose real bugs and also make the code simpler to read
2012-12-04 09:30:24 +01:00
Victor Stinner
d45c7f8d74 Issue #16455: On FreeBSD and Solaris, if the locale is C, the
ASCII/surrogateescape codec is now used, instead of the locale encoding, to
decode the command line arguments. This change fixes inconsistencies with
os.fsencode() and os.fsdecode() because these operating systems announces an
ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.
2012-12-04 01:34:47 +01:00
Victor Stinner
2660e427d1 (Merge 3.2) Issue #16416: On Mac OS X, operating system data are now always
encoded/decoded to/from UTF-8/surrogateescape, instead of the locale encoding
(which may be ASCII if no locale environment variable is set), to avoid
inconsistencies with os.fsencode() and os.fsdecode() functions which are
already using UTF-8/surrogateescape.
2012-12-03 12:48:53 +01:00
Victor Stinner
27b1ca29cc Issue #16416: On Mac OS X, operating system data are now always
encoded/decoded to/from UTF-8/surrogateescape, instead of the locale encoding
(which may be ASCII if no locale environment variable is set), to avoid
inconsistencies with os.fsencode() and os.fsdecode() functions which are
already using UTF-8/surrogateescape.
2012-12-03 12:47:59 +01:00
Antoine Pitrou
5439458a2a Issue #16215: Fix potential double memory free in str.replace().
Patch by Serhiy Storchaka.
2012-11-17 23:29:28 +01:00
Antoine Pitrou
6d5ad227a5 Issue #16215: Fix potential double memory free in str.replace().
Patch by Serhiy Storchaka.
2012-11-17 23:28:17 +01:00
Victor Stinner
0d92c4f667 Issue #16416: Fix error handling in _Py_wchar2char() _Py_char2wchar() functions 2012-11-12 23:32:21 +01:00
Victor Stinner
fc009eff9e Close #16311: Use the _PyUnicodeWriter API in text decoders
* Remove unicode_widen(): replaced with _PyUnicodeWriter_Prepare()
 * Remove unicode_putchar(): replaced with
   PyUnicodeWriter_Prepare() + PyUnicode_WRITER()
 * When handling an decoding error, only overallocate the buffer by +25%
   instead of +100%
2012-11-07 00:36:38 +01:00
Ezio Melotti
cfa9636404 #8271: merge with 3.3. 2012-11-04 23:23:09 +02:00
Ezio Melotti
f7ed5d111b #8271: the utf-8 decoder now outputs the correct number of U+FFFD characters when used with the "replace" error handler on invalid utf-8 sequences. Patch by Serhiy Storchaka, tests by Ezio Melotti. 2012-11-04 23:21:38 +02:00
Benjamin Peterson
7ff2094bc7 merge 3.3 (#16369) 2012-10-30 23:31:12 -04:00
Benjamin Peterson
e8ea97fffb merge 3.2 (#16369) 2012-10-30 23:27:52 -04:00
Benjamin Peterson
c43112823b initialize more global type objects (closes #16369) 2012-10-30 23:21:10 -04:00
Victor Stinner
e64322e034 Close #14625: Rewrite the UTF-32 decoder. It is now 3x to 4x faster
Patch written by Serhiy Storchaka.
2012-10-30 23:12:47 +01:00
Victor Stinner
76df43de30 Issue #16330: Use surrogate-related macros
Patch written by Serhiy Storchaka.
2012-10-30 01:42:39 +01:00
Mark Dickinson
fb90c0934c Issue #14700: Fix buggy overflow checks for large precision and width in new-style and old-style formatting. 2012-10-28 10:18:03 +00:00
Victor Stinner
c6cf1ba29e Replace usage of the deprecated Py_UNICODE_COPY() with Py_MEMCPY() in resize_copy() 2012-10-23 02:54:47 +02:00
Victor Stinner
fe75fb4b3e Optimize _PyUnicode_HasNULChars(): use findchar() instead of PyUnicode_Contains() 2012-10-23 02:52:18 +02:00
Victor Stinner
6fa627578a Inline raise_translate_exception(): it is only used once 2012-10-23 02:51:50 +02:00
Victor Stinner
e5567ad236 Optimize PyUnicode_RichCompare() for Py_EQ and Py_NE: always use memcmp() 2012-10-23 02:48:49 +02:00
Christian Heimes
743e0cd6b5 Issue #16166: Add PY_LITTLE_ENDIAN and PY_BIG_ENDIAN macros and unified
endianess detection and handling.
2012-10-17 23:52:17 +02:00
Chris Jerdonek
4a7df9aba9 Issue #14783: Merge changes from 3.3. 2012-10-07 15:02:16 -07:00
Chris Jerdonek
042fa653ab Issue #14783: Merge changes from 3.2. 2012-10-07 14:56:27 -07:00
Chris Jerdonek
83fe2e1c22 Issue #14783: Improve int() docstring and also str(), range(), and slice().
This commit rewrites the docstring for int() to incorporate the documentation
changes made in issue #16036.  It also switches the docstrings for int(),
str(), range(), and slice() to use multi-line signatures.
2012-10-07 14:48:36 -07:00
Victor Stinner
4c63a972d1 Cleanup PyUnicode_FromFormatV() for zero padding
Skip the "0" instead of parsing it twice: detect zero padding and then parsed
as a digit of the width.
2012-10-06 23:55:33 +02:00
Victor Stinner
15a1136547 Issue #16147: PyUnicode_FromFormatV() doesn't need anymore to allocate a buffer
on the heap to format numbers.
2012-10-06 23:48:20 +02:00
Victor Stinner
ff5a848db5 Issue #16147: PyUnicode_FromFormatV() now raises an error if the argument of
'%c' is not in the range(0x110000).
2012-10-06 23:05:45 +02:00
Victor Stinner
3921e90c5a Issue #16147: PyUnicode_FromFormatV() now detects integer overflow when parsing
width and precision
2012-10-06 23:05:00 +02:00
Victor Stinner
e215d960be Issue #16147: Rewrite PyUnicode_FromFormatV() to use _PyUnicodeWriter API
* Simplify the code: replace 4 steps with one unique step using the
   _PyUnicodeWriter API. PyUnicode_Format() has the same design. It avoids to
   store intermediate results which require to allocate an array of pointers on
   the heap.
 * Use the _PyUnicodeWriter API for speed (and its convinient API):
   overallocate the buffer to reduce the number of "realloc()"
 * Implement "width" and "precision" in Python, don't rely on sprintf(). It
   avoids to need of a temporary buffer allocated on the heap: only use a small
   buffer allocated in the stack.
 * Add _PyUnicodeWriter_WriteCstr() function
 * Split PyUnicode_FromFormatV() into two functions: add
   unicode_fromformat_arg().
 * Inline parse_format_flags(): the format of an argument is now only parsed
   once, it's no more needed to have a subfunction.
 * Optimize PyUnicode_FromFormatV() for characters between two "%" arguments:
   search the next "%" and copy the substring in one chunk, instead of copying
   character per character.
2012-10-06 23:03:36 +02:00
Mark Dickinson
ff9c54aca2 Issue #16096: Merge fixes from 3.3. 2012-10-06 18:05:14 +01:00
Mark Dickinson
c04ddff290 Issue #16096: Fix several occurrences of potential signed integer overflow. Thanks Serhiy Storchaka. 2012-10-06 18:04:49 +01:00
Victor Stinner
8c6db45d3e In debug mode, unicode_write_cstr() now checks that non-ASCII characters are
not written into an ASCII string
2012-10-06 00:40:45 +02:00
Ezio Melotti
080a2c087e #16127: merge with 3.3. 2012-10-05 03:34:02 +03:00
Ezio Melotti
e7f90375b1 #16127: remove outdated references to narrow builds. Patch by Serhiy Storchaka. 2012-10-05 03:33:31 +03:00
Victor Stinner
1929407406 Fix PyUnicode_Format(): return NULL if PyUnicode_READY(uformat) failed
This error cannot occur in practice: PyUnicode_FromObject() always return
a "ready" string.
2012-10-05 00:09:33 +02:00
Victor Stinner
770e19e0cc Optimize unicode_compare(): use memcmp() when comparing two UCS1 strings 2012-10-04 22:59:45 +02:00
Victor Stinner
90db9c47dc Enable also ptr==ptr optimization in PyUnicode_Compare()
It was already implemented in PyUnicode_RichCompare()
2012-10-04 21:53:50 +02:00
Victor Stinner
aa7712711d unicode_result_wchar(): move the assert() to the "#ifdef Py_DEBUG" block 2012-10-04 02:32:58 +02:00
Victor Stinner
a4708231e6 Split the huge PyUnicode_Format() function (+540 lines) into subfunctions 2012-10-04 02:19:54 +02:00
Victor Stinner
a049443fab PyUnicode_Format(): disable overallocation when we are writing the last part
of the output string
2012-10-03 23:03:46 +02:00
Victor Stinner
afffce489b Unicode: resize_compact() and resize_inplace() fills also the Unicode strings
with invalid bytes in debug mode, as done by PyUnicode_New()
2012-10-03 23:03:17 +02:00
Victor Stinner
c89d28fdfc Issue #15609: Fix refleak introduced by my last optimization 2012-10-02 12:54:07 +02:00
Victor Stinner
621ef3d84f Issue #15609: Optimize str%args for integer argument
- Use _PyLong_FormatWriter() instead of formatlong() when possible, to avoid
   a temporary buffer
 - Enable the fast path when width is smaller or equals to the length,
   and when the precision is bigger or equals to the length
 - Add unit tests!
 - formatlong() uses PyUnicode_Resize() instead of _PyUnicode_FromASCII()
   to resize the output string
2012-10-02 00:33:47 +02:00
Antoine Pitrou
a1f7655fa7 Issue #15379: Fix passing of non-BMP characters as integers for the charmap decoder (already working as unicode strings).
Patch by Serhiy Storchaka.
2012-09-23 20:00:04 +02:00
Antoine Pitrou
6f80f5d444 Issue #15379: Fix passing of non-BMP characters as integers for the charmap decoder (already working as unicode strings).
Patch by Serhiy Storchaka.
2012-09-23 19:55:21 +02:00
Antoine Pitrou
ca8aa4acf6 Issue #15144: Fix possible integer overflow when handling pointers as integer values, by using Py_uintptr_t instead of size_t.
Patch by Serhiy Storchaka.
2012-09-20 20:56:47 +02:00
Christian Heimes
5f520f4fed Issue #15900: Fixed reference leak in PyUnicode_TranslateCharmap() 2012-09-11 14:03:25 +02:00
Christian Heimes
f4f9939a96 Fixed memory leak in error branch of formatfloat(). CID 719687 2012-09-10 11:48:41 +02:00