Commit graph

1722 commits

Author SHA1 Message Date
Serhiy Storchaka
059972535f Issue #10156: In the interpreter's initialization phase, unicode globals
are now initialized dynamically as needed.
2013-01-26 12:14:02 +02:00
Serhiy Storchaka
570c5b2354 Issue #16980: Fix processing of escaped non-ascii bytes in the
unicode-escape-decode decoder.
2013-01-25 23:53:29 +02:00
Serhiy Storchaka
73e38809e0 Issue #16980: Fix processing of escaped non-ascii bytes in the
unicode-escape-decode decoder.
2013-01-25 23:52:21 +02:00
Serhiy Storchaka
6481bfb2b5 Issue #16335: Fix integer overflow in unicode-escape decoder. 2013-01-21 11:44:40 +02:00
Serhiy Storchaka
c35f3a9f61 Issue #16335: Fix integer overflow in unicode-escape decoder. 2013-01-21 11:42:57 +02:00
Serhiy Storchaka
4f5f0e54e0 Issue #16335: Fix integer overflow in unicode-escape decoder. 2013-01-21 11:38:00 +02:00
Serhiy Storchaka
441d30fac7 Issue #15989: Fix several occurrences of integer overflow
when result of PyLong_AsLong() narrowed to int without checks.

This is a backport of changesets 13e2e44db99d and 525407d89277.
2013-01-19 12:26:26 +02:00
Serhiy Storchaka
9101e23ff6 Issue #15989: Fix several occurrences of integer overflow
when result of PyLong_AsLong() narrowed to int without checks.

This is a backport of changesets 13e2e44db99d and 525407d89277.
2013-01-19 12:41:45 +02:00
Serhiy Storchaka
55e2cb497b Issue #14850: Now a chamap decoder treates U+FFFE as "undefined mapping"
in any mapping, not only in an unicode string.
2013-01-15 15:30:04 +02:00
Serhiy Storchaka
45d16d9924 Issue #14850: Now a chamap decoder treates U+FFFE as "undefined mapping"
in any mapping, not only in an unicode string.
2013-01-15 15:01:20 +02:00
Serhiy Storchaka
4fb8caee87 Issue #14850: Now a chamap decoder treates U+FFFE as "undefined mapping"
in any mapping, not only in an unicode string.
2013-01-15 14:43:21 +02:00
Serhiy Storchaka
7898043868 Issue #15989: Fix several occurrences of integer overflow
when result of PyLong_AsLong() narrowed to int without checks.
2013-01-15 01:12:17 +02:00
Benjamin Peterson
0b32a480bd merge 3.3 (#16906) 2013-01-09 09:52:22 -06:00
Benjamin Peterson
0c270a8bb7 correct static string clearing loop (closes #16906) 2013-01-09 09:52:01 -06:00
Serhiy Storchaka
24a3ef6999 Issue #11461: Fix the incremental UTF-16 decoder. Original patch by
Amaury Forgeot d'Arc. Added tests for partial decoding of non-BMP
characters.
2013-01-08 23:41:55 +02:00
Serhiy Storchaka
ae3b32ad6b Issue #11461: Fix the incremental UTF-16 decoder. Original patch by
Amaury Forgeot d'Arc. Added tests for partial decoding of non-BMP
characters.
2013-01-08 23:40:52 +02:00
Serhiy Storchaka
48e188e573 Issue #11461: Fix the incremental UTF-16 decoder. Original patch by
Amaury Forgeot d'Arc. Added tests for partial decoding of non-BMP
characters.
2013-01-08 23:14:24 +02:00
Serhiy Storchaka
dec798eb46 Fix out of bound read in UTF-32 decoder on "narrow Unicode" builds. 2013-01-08 22:45:42 +02:00
Serhiy Storchaka
4e02538bf3 Issue #16856: Fix a segmentation fault from calling repr() on a dict with
a key whose repr raise an exception.
2013-01-04 12:40:35 +02:00
Serhiy Storchaka
6c83e739d7 Issue #16856: Fix a segmentation fault from calling repr() on a dict with
a key whose repr raise an exception.
2013-01-04 12:39:34 +02:00
Victor Stinner
18aa4477d3 Close #16281: handle tailmatch() failure and remove useless comment
"honor direction and do a forward or backwards search": the runtime speed may
be different, but I consider that it doesn't really matter in practice. The
direction was never honored before: Python 2.7 uses memcmp() for the str type
for example.
2013-01-03 03:18:09 +01:00
Victor Stinner
7ae320d667 (Merge 3.2) Issue #16455: On FreeBSD and Solaris, if the locale is C, the
ASCII/surrogateescape codec is now used, instead of the locale encoding, to
decode the command line arguments. This change fixes inconsistencies with
os.fsencode() and os.fsdecode() because these operating systems announces an
ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.
2013-01-03 01:21:07 +01:00
Victor Stinner
20b654acb5 Issue #16455: On FreeBSD and Solaris, if the locale is C, the
ASCII/surrogateescape codec is now used, instead of the locale encoding, to
decode the command line arguments. This change fixes inconsistencies with
os.fsencode() and os.fsdecode() because these operating systems announces an
ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.
2013-01-03 01:08:58 +01:00
Andrew Svetlov
2606a6f197 Issue #16719: Get rid of WindowsError. Use OSError instead
Patch by Serhiy Storchaka.
2012-12-19 14:33:35 +02:00
Gregory P. Smith
27dc02e8c5 Fix the internals of our hash functions to used unsigned values during hash
computation as the overflow behavior of signed integers is undefined.

NOTE: This change is smaller compared to 3.2 as much of this cleanup had
already been done.  I added the comment that my change in 3.2 added so that the
code would match up.  Otherwise this just adds or synchronizes appropriate UL
designations on some constants to be pedantic.

In practice we require compiling everything with -fwrapv which forces overflow
to be defined as twos compliment but this keeps the code cleaner for checkers
or in the case where someone has compiled it without -fwrapv or their
compiler's equivalent.  We could work to get rid of the -fwrapv requirement
in 3.4 but that requires more planning.

Found by Clang trunk's Undefined Behavior Sanitizer (UBSan).

Cleanup only - no functionality or hash values change.
2012-12-10 19:51:29 -08:00
Gregory P. Smith
c2176e46d7 Fix the internals of our hash functions to used unsigned values during hash
computation as the overflow behavior of signed integers is undefined.

NOTE: This change is smaller compared to 3.2 as much of this cleanup had
already been done.  I added the comment that my change in 3.2 added so that the
code would match up.  Otherwise this just adds or synchronizes appropriate UL
designations on some constants to be pedantic.

In practice we require compiling everything with -fwrapv which forces overflow
to be defined as twos compliment but this keeps the code cleaner for checkers
or in the case where someone has compiled it without -fwrapv or their
compiler's equivalent.

Found by Clang trunk's Undefined Behavior Sanitizer (UBSan).

Cleanup only - no functionality or hash values change.
2012-12-10 18:32:53 -08:00
Gregory P. Smith
27cbcd6241 Fix the internals of our hash functions to used unsigned values during hash
computation as the overflow behavior of signed integers is undefined.

In practice we require compiling everything with -fwrapv which forces overflow
to be defined as twos compliment but this keeps the code cleaner for checkers
or in the case where someone has compiled it without -fwrapv or their
compiler's equivalent.

Found by Clang trunk's Undefined Behavior Sanitizer (UBSan).

Cleanup only - no functionality or hash values change.
2012-12-10 18:15:46 -08:00
Victor Stinner
8dbd421b4d Cleanup unicodeobject.c
* Remove micro-optization:
   (errors == "surrogateescape" || strcmp(errors, "surrogateescape") == 0).
   Only use strcmp()
 * Initialize 'arg' members in unicode_format_arg() to help the compiler to
   diagnose real bugs and also make the code simpler to read
2012-12-04 09:30:24 +01:00
Victor Stinner
d45c7f8d74 Issue #16455: On FreeBSD and Solaris, if the locale is C, the
ASCII/surrogateescape codec is now used, instead of the locale encoding, to
decode the command line arguments. This change fixes inconsistencies with
os.fsencode() and os.fsdecode() because these operating systems announces an
ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.
2012-12-04 01:34:47 +01:00
Victor Stinner
2660e427d1 (Merge 3.2) Issue #16416: On Mac OS X, operating system data are now always
encoded/decoded to/from UTF-8/surrogateescape, instead of the locale encoding
(which may be ASCII if no locale environment variable is set), to avoid
inconsistencies with os.fsencode() and os.fsdecode() functions which are
already using UTF-8/surrogateescape.
2012-12-03 12:48:53 +01:00
Victor Stinner
27b1ca29cc Issue #16416: On Mac OS X, operating system data are now always
encoded/decoded to/from UTF-8/surrogateescape, instead of the locale encoding
(which may be ASCII if no locale environment variable is set), to avoid
inconsistencies with os.fsencode() and os.fsdecode() functions which are
already using UTF-8/surrogateescape.
2012-12-03 12:47:59 +01:00
Antoine Pitrou
5439458a2a Issue #16215: Fix potential double memory free in str.replace().
Patch by Serhiy Storchaka.
2012-11-17 23:29:28 +01:00
Antoine Pitrou
6d5ad227a5 Issue #16215: Fix potential double memory free in str.replace().
Patch by Serhiy Storchaka.
2012-11-17 23:28:17 +01:00
Victor Stinner
0d92c4f667 Issue #16416: Fix error handling in _Py_wchar2char() _Py_char2wchar() functions 2012-11-12 23:32:21 +01:00
Victor Stinner
fc009eff9e Close #16311: Use the _PyUnicodeWriter API in text decoders
* Remove unicode_widen(): replaced with _PyUnicodeWriter_Prepare()
 * Remove unicode_putchar(): replaced with
   PyUnicodeWriter_Prepare() + PyUnicode_WRITER()
 * When handling an decoding error, only overallocate the buffer by +25%
   instead of +100%
2012-11-07 00:36:38 +01:00
Ezio Melotti
cfa9636404 #8271: merge with 3.3. 2012-11-04 23:23:09 +02:00
Ezio Melotti
f7ed5d111b #8271: the utf-8 decoder now outputs the correct number of U+FFFD characters when used with the "replace" error handler on invalid utf-8 sequences. Patch by Serhiy Storchaka, tests by Ezio Melotti. 2012-11-04 23:21:38 +02:00
Benjamin Peterson
7ff2094bc7 merge 3.3 (#16369) 2012-10-30 23:31:12 -04:00
Benjamin Peterson
e8ea97fffb merge 3.2 (#16369) 2012-10-30 23:27:52 -04:00
Benjamin Peterson
c43112823b initialize more global type objects (closes #16369) 2012-10-30 23:21:10 -04:00
Victor Stinner
e64322e034 Close #14625: Rewrite the UTF-32 decoder. It is now 3x to 4x faster
Patch written by Serhiy Storchaka.
2012-10-30 23:12:47 +01:00
Victor Stinner
76df43de30 Issue #16330: Use surrogate-related macros
Patch written by Serhiy Storchaka.
2012-10-30 01:42:39 +01:00
Mark Dickinson
fb90c0934c Issue #14700: Fix buggy overflow checks for large precision and width in new-style and old-style formatting. 2012-10-28 10:18:03 +00:00
Victor Stinner
c6cf1ba29e Replace usage of the deprecated Py_UNICODE_COPY() with Py_MEMCPY() in resize_copy() 2012-10-23 02:54:47 +02:00
Victor Stinner
fe75fb4b3e Optimize _PyUnicode_HasNULChars(): use findchar() instead of PyUnicode_Contains() 2012-10-23 02:52:18 +02:00
Victor Stinner
6fa627578a Inline raise_translate_exception(): it is only used once 2012-10-23 02:51:50 +02:00
Victor Stinner
e5567ad236 Optimize PyUnicode_RichCompare() for Py_EQ and Py_NE: always use memcmp() 2012-10-23 02:48:49 +02:00
Christian Heimes
743e0cd6b5 Issue #16166: Add PY_LITTLE_ENDIAN and PY_BIG_ENDIAN macros and unified
endianess detection and handling.
2012-10-17 23:52:17 +02:00
Chris Jerdonek
4a7df9aba9 Issue #14783: Merge changes from 3.3. 2012-10-07 15:02:16 -07:00
Chris Jerdonek
042fa653ab Issue #14783: Merge changes from 3.2. 2012-10-07 14:56:27 -07:00