Commit graph

277 commits

Author SHA1 Message Date
Serhiy Storchaka
48e188e573 Issue #11461: Fix the incremental UTF-16 decoder. Original patch by
Amaury Forgeot d'Arc. Added tests for partial decoding of non-BMP
characters.
2013-01-08 23:14:24 +02:00
Andrew Svetlov
2606a6f197 Issue #16719: Get rid of WindowsError. Use OSError instead
Patch by Serhiy Storchaka.
2012-12-19 14:33:35 +02:00
Ezio Melotti
a0b5c46fa2 #16336: merge with 3.2. 2012-11-03 23:04:41 +02:00
Ezio Melotti
540da76115 #16336: fix input checking in the surrogatepass error handler. Patch by Serhiy Storchaka. 2012-11-03 23:03:39 +02:00
Philip Jenvey
5f9459fbed merge with 3.2 2012-10-26 17:05:09 -07:00
Philip Jenvey
45c41494bf bounds check for bad data (thanks amaury) 2012-10-26 17:01:53 -07:00
Antoine Pitrou
a1f7655fa7 Issue #15379: Fix passing of non-BMP characters as integers for the charmap decoder (already working as unicode strings).
Patch by Serhiy Storchaka.
2012-09-23 20:00:04 +02:00
Antoine Pitrou
6f80f5d444 Issue #15379: Fix passing of non-BMP characters as integers for the charmap decoder (already working as unicode strings).
Patch by Serhiy Storchaka.
2012-09-23 19:55:21 +02:00
Antoine Pitrou
5e36edbaba Port additional tests from #14579 (the issue is already fixed). 2012-07-21 00:47:48 +02:00
Antoine Pitrou
b4bbee25b1 Issue #14579: Fix CVE-2012-2135: vulnerability in the utf-16 decoder after error handling.
Patch by Serhiy Storchaka.
2012-07-21 00:45:14 +02:00
Victor Stinner
e3b47152a4 Write tests for invalid characters (U+00110000)
Test the following functions:

 * codecs.raw_unicode_escape_decode()
 * PyUnicode_FromWideChar()
 * PyUnicode_FromUnicode()
 * "unicode_internal" and "unicode_escape" decoders
2011-12-09 20:49:49 +01:00
Ezio Melotti
adc417ce36 #13406: fix more deprecation warnings and move the deprecation of unicode-internal earlier in the code. 2011-11-17 12:23:34 +02:00
Ezio Melotti
345379a7f8 #13406: correct the error message in check_warnings too. 2011-11-16 09:54:19 +02:00
Ezio Melotti
11060a4a48 #13406: silence deprecation warnings in test_codecs. 2011-11-16 09:39:10 +02:00
Victor Stinner
040e16e3e8 "unicode_internal" codec has been deprecated: fix related tests 2011-11-15 22:44:05 +01:00
Victor Stinner
76a31a6bff Cleanup decode_code_page_stateful() and encode_code_page()
* Fix decode_code_page_errors() result
 * Inline decode_code_page() and encode_code_page_chunk()
 * Replace the PyUnicodeObject type by PyObject
2011-11-04 00:05:13 +01:00
Victor Stinner
2f3ca9f20e Close #13247: Add cp65001 codec, the Windows UTF-8 (CP_UTF8) 2011-10-27 01:38:56 +02:00
Victor Stinner
9e92188f53 Issue #12281: Fix test_codecs.test_cp932() on Windows XP
Cool! Decoding b'\x81\x00abc' from cp932 with replace error handler is now
giving the same result on all Windows versions.
2011-10-18 21:55:25 +02:00
Victor Stinner
62be4fb21f Issue #12281: Skip code page tests on non-Windows platforms 2011-10-18 21:46:37 +02:00
Victor Stinner
3a50e7056e Issue #12281: Rewrite the MBCS codec to handle correctly replace and ignore
error handlers on all Windows versions. The MBCS codec is now supporting all
error handlers, instead of only replace to encode and ignore to decode.
2011-10-18 21:21:00 +02:00
Antoine Pitrou
00b2c86d09 Fix text failures when ctypes is not available
(followup to Victor's 85d11cf67aa8 and 7a50e549bd11)
2011-10-05 13:01:41 +02:00
Victor Stinner
182d90d9ee Fix test_codecs for Windows: check size of wchar_t, not sys.maxunicode 2011-09-29 19:53:55 +02:00
Martin v. Löwis
d63a3b8beb Implement PEP 393. 2011-09-28 07:41:54 +02:00
Antoine Pitrou
2a20f9be70 Backport 0398f07d4827 (fix for weird buildbot failures) 2011-07-27 01:06:07 +02:00
Antoine Pitrou
d05066d1ee Try to fix weird buildbot failures 2011-07-26 23:55:33 +02:00
Antoine Pitrou
5a24d82941 Add a test for issue #1813: getlocale() failing under a Turkish locale
(not a problem under 3.x)
2011-07-24 02:41:54 +02:00
Antoine Pitrou
cf9d3c08c8 Issue #1813: Fix codec lookup under Turkish locales. 2011-07-24 02:27:04 +02:00
Victor Stinner
0501070669 Revert my commit 3555cf6f9c98: "Issue #8796: codecs.open() calls the builtin
open() function instead of using StreamReaderWriter. Deprecate StreamReader,
StreamWriter, StreamReaderWriter, StreamRecoder and EncodedFile() of the codec
module. Use the builtin open() function or io.TextIOWrapper instead."

"It has not been approved !" wrote Marc-Andre Lemburg.
2011-05-27 16:50:40 +02:00
Victor Stinner
98fe1a0c3b Issue #8796: codecs.open() calls the builtin open() function instead of using
StreamReaderWriter. Deprecate StreamReader, StreamWriter, StreamReaderWriter,
StreamRecoder and EncodedFile() of the codec module. Use the builtin open()
function or io.TextIOWrapper instead.
2011-05-27 01:51:18 +02:00
Victor Stinner
d6881701fb Merge 3.2 2011-05-23 14:58:07 +02:00
Victor Stinner
b43dd4b8ca Merge 3.1 2011-05-23 14:57:05 +02:00
Victor Stinner
2cca057284 test_codecs now removes the temporay file (created by the test) 2011-05-23 14:51:42 +02:00
Marc-André Lemburg
8f36af7a4c Normalize the encoding names for Latin-1 and UTF-8 to
'latin-1' and 'utf-8'.

These are optimized in the Python Unicode implementation
to result in more direct processing, bypassing the codec
registry.

Also see issue11303.
2011-02-25 15:42:01 +00:00
Benjamin Peterson
28a4dce6a8 remove (un)transform methods 2010-12-12 01:33:04 +00:00
Victor Stinner
53a9dd776e Issue #10546: UTF-16-LE and UTF-16-BE *do* support non-BMP characters
Fix the doc and add tests.
2010-12-08 22:25:45 +00:00
Georg Brandl
02524629f3 #7475: add (un)transform method to bytes/bytearray and str, add back codecs that can be used with them from Python 2. 2010-12-02 18:06:51 +00:00
Ezio Melotti
19f2aeba67 Merged revisions 86596 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/py3k

........
  r86596 | ezio.melotti | 2010-11-20 21:04:17 +0200 (Sat, 20 Nov 2010) | 1 line

  #9424: Replace deprecated assert* methods in the Python test suite.
........
2010-11-21 01:30:29 +00:00
Ezio Melotti
b3aedd4862 #9424: Replace deprecated assert* methods in the Python test suite. 2010-11-20 19:04:17 +00:00
Benjamin Peterson
5a6214afe2 Merged revisions 81499,81506 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r81499 | georg.brandl | 2010-05-24 16:29:07 -0500 (Mon, 24 May 2010) | 1 line

  #8016: add the CP858 codec (approved by Benjamin).  (Also add CP720 to the tests, it was missing there.)
........
  r81506 | benjamin.peterson | 2010-05-24 17:04:53 -0500 (Mon, 24 May 2010) | 1 line

  set svn:eol-style
........
2010-06-27 22:41:29 +00:00
Victor Stinner
554f3f0081 Issue #850997: mbcs encoding (Windows only) handles errors argument: strict
mode raises unicode errors. The encoder only supports "strict" and "replace"
error handlers, the decoder only supports "strict" and "ignore" error handlers.
2010-06-16 23:33:54 +00:00
Antoine Pitrou
6107a688ee Merged revisions 81908 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/py3k

................
  r81908 | antoine.pitrou | 2010-06-11 23:46:32 +0200 (ven., 11 juin 2010) | 11 lines

  Merged revisions 81907 via svnmerge from
  svn+ssh://pythondev@svn.python.org/python/trunk

  ........
    r81907 | antoine.pitrou | 2010-06-11 23:42:26 +0200 (ven., 11 juin 2010) | 5 lines

    Issue #8941: decoding big endian UTF-32 data in UCS-2 builds could crash
    the interpreter with characters outside the Basic Multilingual Plane
    (higher than 0x10000).
  ........
................
2010-06-11 21:48:34 +00:00
Antoine Pitrou
cc0cfd3576 Merged revisions 81907 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r81907 | antoine.pitrou | 2010-06-11 23:42:26 +0200 (ven., 11 juin 2010) | 5 lines

  Issue #8941: decoding big endian UTF-32 data in UCS-2 builds could crash
  the interpreter with characters outside the Basic Multilingual Plane
  (higher than 0x10000).
........
2010-06-11 21:46:32 +00:00
Philip Jenvey
ddf0d0383c Merged revisions 79780 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/py3k

................
  r79780 | philip.jenvey | 2010-04-04 20:05:24 -0700 (Sun, 04 Apr 2010) | 9 lines

  Merged revisions 79779 via svnmerge from
  svn+ssh://pythondev@svn.python.org/python/trunk

  ........
    r79779 | philip.jenvey | 2010-04-04 19:51:51 -0700 (Sun, 04 Apr 2010) | 2 lines

    fix escape_encode to return the correct consumed size
  ........
................
2010-06-09 17:56:11 +00:00
Victor Stinner
3dcb5acdb0 Issue #8838, #8339: Remove codecs.charbuffer_encode() and "t#" parsing format
Remove last references to the "char buffer" of the buffer protocol from
Python3.
2010-06-08 22:54:19 +00:00
Victor Stinner
b64d0eba50 Merged revisions 81474 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/py3k

................
  r81474 | victor.stinner | 2010-05-22 18:59:09 +0200 (sam., 22 mai 2010) | 20 lines

  Merged revisions 81471-81472 via svnmerge from
  svn+ssh://pythondev@svn.python.org/python/trunk

  ........
    r81471 | victor.stinner | 2010-05-22 15:37:56 +0200 (sam., 22 mai 2010) | 7 lines

    Issue #6268: More bugfixes about BOM, UTF-16 and UTF-32

     * Fix seek() method of codecs.open(), don't write the BOM twice after seek(0)
     * Fix reset() method of codecs, UTF-16, UTF-32 and StreamWriter classes
     * test_codecs: use "w+" mode instead of "wt+". "t" mode is not supported by
       Solaris or Windows, but does it really exist? I found it the in the issue.
  ........
    r81472 | victor.stinner | 2010-05-22 15:44:25 +0200 (sam., 22 mai 2010) | 4 lines

    Fix my last commit (r81471) about codecs

    Rememder: don't touch the code just before a commit
  ........
................
2010-05-22 17:01:13 +00:00
Victor Stinner
a92ad7ee2c Merged revisions 81471-81472 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r81471 | victor.stinner | 2010-05-22 15:37:56 +0200 (sam., 22 mai 2010) | 7 lines

  Issue #6268: More bugfixes about BOM, UTF-16 and UTF-32

   * Fix seek() method of codecs.open(), don't write the BOM twice after seek(0)
   * Fix reset() method of codecs, UTF-16, UTF-32 and StreamWriter classes
   * test_codecs: use "w+" mode instead of "wt+". "t" mode is not supported by
     Solaris or Windows, but does it really exist? I found it the in the issue.
........
  r81472 | victor.stinner | 2010-05-22 15:44:25 +0200 (sam., 22 mai 2010) | 4 lines

  Fix my last commit (r81471) about codecs

  Rememder: don't touch the code just before a commit
........
2010-05-22 16:59:09 +00:00
Victor Stinner
37b8200608 Merged revisions 81461 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/py3k

................
  r81461 | victor.stinner | 2010-05-22 04:16:27 +0200 (sam., 22 mai 2010) | 10 lines

  Merged revisions 81459 via svnmerge from
  svn+ssh://pythondev@svn.python.org/python/trunk

  ........
    r81459 | victor.stinner | 2010-05-22 04:11:07 +0200 (sam., 22 mai 2010) | 3 lines

    Issue #6268: Fix seek() method of codecs.open(), don't read the BOM twice
    after seek(0)
  ........
................
2010-05-22 02:17:42 +00:00
Victor Stinner
3fed0870a6 Merged revisions 81459 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r81459 | victor.stinner | 2010-05-22 04:11:07 +0200 (sam., 22 mai 2010) | 3 lines

  Issue #6268: Fix seek() method of codecs.open(), don't read the BOM twice
  after seek(0)
........
2010-05-22 02:16:27 +00:00
Victor Stinner
158701d886 Merged revisions 80382 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/py3k

........
  r80382 | victor.stinner | 2010-04-22 21:38:16 +0200 (jeu., 22 avril 2010) | 3 lines

  Issue #8092: Fix PyUnicode_EncodeUTF8() to support error handler producing
  unicode string (eg. backslashreplace)
........
2010-04-22 19:41:01 +00:00
Victor Stinner
31be90b0c7 Issue #8092: Fix PyUnicode_EncodeUTF8() to support error handler producing
unicode string (eg. backslashreplace)
2010-04-22 19:38:16 +00:00