Commit graph

1520 commits

Author SHA1 Message Date
Victor Stinner
4c7db315df Issue #9738, #9836: Fix refleak introduced by r84704 2010-09-12 07:51:18 +00:00
Benjamin Peterson
9be0b2e312 detect non-ascii characters much earlier (plugs ref leak) 2010-09-12 03:40:54 +00:00
Victor Stinner
1205f2774e Issue #9738: PyUnicode_FromFormat() and PyErr_Format() raise an error on
a non-ASCII byte in the format string.

Document also the encoding.
2010-09-11 00:54:47 +00:00
Victor Stinner
46408606d8 Rename PyUnicode_strdup() to PyUnicode_AsUnicodeCopy() 2010-09-03 16:18:00 +00:00
Victor Stinner
71133ff368 Create PyUnicode_strdup() function 2010-09-01 23:43:53 +00:00
Victor Stinner
c4eb765fc1 Create Py_UNICODE_strcat() function 2010-09-01 23:43:50 +00:00
Victor Stinner
42cb462682 Remove unicode_default_encoding constant
Inline its value in PyUnicode_GetDefaultEncoding(). The comment is now outdated
(we will not change its value anymore).
2010-09-01 19:39:01 +00:00
Antoine Pitrou
fce7fd6426 Issue #9549: sys.setdefaultencoding() and PyUnicode_SetDefaultEncoding()
are now removed, since their effect was inexistent in 3.x (the default
encoding is hardcoded to utf-8 and cannot be changed).
2010-09-01 18:54:56 +00:00
Antoine Pitrou
a2983c6734 Merged revisions 84394 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/py3k

........
  r84394 | antoine.pitrou | 2010-09-01 17:10:12 +0200 (mer., 01 sept. 2010) | 4 lines

  Issue #7415: PyUnicode_FromEncodedObject() now uses the new buffer API
  properly.  Patch by Stefan Behnel.
........
2010-09-01 15:16:41 +00:00
Antoine Pitrou
b0fa831d1e Issue #7415: PyUnicode_FromEncodedObject() now uses the new buffer API
properly.  Patch by Stefan Behnel.
2010-09-01 15:10:12 +00:00
Daniel Stutzbach
8515eaefda Issue 8781: On systems a signed 4-byte wchar_t and a 4-byte Py_UNICODE, use memcpy to convert between the two (as already done when wchar_t is unsigned) 2010-08-24 21:57:33 +00:00
Victor Stinner
3119ed73aa Fix PyUnicode_EncodeFSDefault() indentation 2010-08-18 22:26:50 +00:00
Victor Stinner
ef8d95c498 Issue #9425: Create Py_UNICODE_strncmp() function
The code is based on strncmp() of the libiberty library,
function in the public domain.
2010-08-16 22:03:11 +00:00
Victor Stinner
47fcb5b4c3 Issue #9542: Create PyUnicode_FSDecoder() function
It's a ParseTuple converter: decode bytes objects to unicode using
PyUnicode_DecodeFSDefaultAndSize(); str objects are output as-is.

 * Don't specify surrogateescape error handler in the comments nor the
   documentation, but PyUnicode_DecodeFSDefaultAndSize() and
   PyUnicode_EncodeFSDefault() because these functions use strict error handler
   for the mbcs encoding (on Windows).
 * Remove PyUnicode_FSConverter() comment in unicodeobject.c to avoid
   inconsistency with unicodeobject.h.
2010-08-13 23:59:58 +00:00
Victor Stinner
4a2b7a1b14 Issue #9425: Create PyErr_WarnFormat() function
Similar to PyErr_WarnEx() but use PyUnicode_FromFormatV() to format the warning
message.

Strip also some trailing spaces.
2010-08-13 14:03:48 +00:00
Alexander Belopolsky
f0f45142d5 Issue #2443: Added a new macro, Py_VA_COPY, which is equivalent to C99
va_copy, but available on all python platforms.  Untabified a few
unrelated files.
2010-08-11 17:31:17 +00:00
Victor Stinner
331ea92ade Issue #9425: create Py_UNICODE_strrchr() function 2010-08-10 16:37:20 +00:00
Georg Brandl
1fa11af7aa Merged revisions 83226-83227,83229-83232 via svnmerge from
svn+ssh://svn.python.org/python/branches/py3k

........
  r83226 | georg.brandl | 2010-07-29 16:17:12 +0200 (Do, 29 Jul 2010) | 1 line

  #1090076: explain the behavior of *vars* in get() better.
........
  r83227 | georg.brandl | 2010-07-29 16:23:06 +0200 (Do, 29 Jul 2010) | 1 line

  Use Py_CLEAR().
........
  r83229 | georg.brandl | 2010-07-29 16:32:22 +0200 (Do, 29 Jul 2010) | 1 line

  #9407: document configparser.Error.
........
  r83230 | georg.brandl | 2010-07-29 16:36:11 +0200 (Do, 29 Jul 2010) | 1 line

  Use correct directive and name.
........
  r83231 | georg.brandl | 2010-07-29 16:46:07 +0200 (Do, 29 Jul 2010) | 1 line

  #9397: remove mention of dbm.bsd which does not exist anymore.
........
  r83232 | georg.brandl | 2010-07-29 16:49:08 +0200 (Do, 29 Jul 2010) | 1 line

  #9388: remove ERA_YEAR which is never defined in the source code.
........
2010-08-01 21:03:01 +00:00
Georg Brandl
0f1470960c Recorded merge of revisions 83444 via svnmerge from
svn+ssh://svn.python.org/python/branches/py3k

........
  r83444 | georg.brandl | 2010-08-01 22:51:02 +0200 (So, 01 Aug 2010) | 1 line

  Revert r83395, it introduces test failures and is not necessary anyway since we now have to nul-terminate the string anyway.
........
2010-08-01 20:54:22 +00:00
Georg Brandl
78eef3de88 Revert r83395, it introduces test failures and is not necessary anyway since we now have to nul-terminate the string anyway. 2010-08-01 20:51:02 +00:00
Georg Brandl
a70070c9e5 Merged revisions 83395,83417 via svnmerge from
svn+ssh://svn.python.org/python/branches/py3k

........
  r83395 | georg.brandl | 2010-08-01 10:49:18 +0200 (So, 01 Aug 2010) | 1 line

  #8821: do not rely on Unicode strings being terminated with a \u0000, rather explicitly check range before looking for a second surrogate character.
........
  r83417 | georg.brandl | 2010-08-01 20:38:26 +0200 (So, 01 Aug 2010) | 1 line

  #5776: fix mistakes in python specfile.  (Nobody probably uses it anyway.)
........
2010-08-01 18:59:44 +00:00
Georg Brandl
bd534f0349 #8821: do not rely on Unicode strings being terminated with a \u0000, rather explicitly check range before looking for a second surrogate character. 2010-08-01 08:49:18 +00:00
Georg Brandl
8ee604b989 Use Py_CLEAR(). 2010-07-29 14:23:06 +00:00
Stefan Krah
aebd6f4c29 Merged revisions 82978 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/py3k

........
  r82978 | stefan.krah | 2010-07-19 19:58:26 +0200 (Mon, 19 Jul 2010) | 3 lines

  Sub-issue of #9036: Fix incorrect use of Py_CHARMASK.
........
2010-07-19 18:01:13 +00:00
Stefan Krah
99212f61db Sub-issue of #9036: Fix incorrect use of Py_CHARMASK. 2010-07-19 17:58:26 +00:00
Senthil Kumaran
74ceac2306 Merged revisions 82573 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/py3k

........
  r82573 | senthil.kumaran | 2010-07-05 17:30:56 +0530 (Mon, 05 Jul 2010) | 3 lines

  Fix the docstrings of the capitalize method.
........
2010-07-05 12:04:23 +00:00
Senthil Kumaran
e51ee8a5bc Fix the docstrings of the capitalize method. 2010-07-05 12:00:56 +00:00
Ezio Melotti
25bc019d46 Merged revisions 82413,82468 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/py3k

........
  r82413 | ezio.melotti | 2010-07-01 10:32:02 +0300 (Thu, 01 Jul 2010) | 13 lines

  Update PyUnicode_DecodeUTF8 from RFC 2279 to RFC 3629.

  1) #8271: when a byte sequence is invalid, only the start byte and all the
     valid continuation bytes are now replaced by U+FFFD, instead of replacing
     the number of bytes specified by the start byte.
     See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95);
  2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes
     in behavior);
  3) Change the error messages "unexpected code byte" to "invalid start byte"
     and "invalid data" to "invalid continuation byte";
  4) Add an extensive set of tests in test_unicode;
  5) Fix test_codeccallbacks because it was failing after this change.
........
  r82468 | ezio.melotti | 2010-07-03 07:52:19 +0300 (Sat, 03 Jul 2010) | 1 line

  Update comment about surrogates.
........
2010-07-03 05:18:50 +00:00
Ezio Melotti
9bf2b3ae6a Update comment about surrogates. 2010-07-03 04:52:19 +00:00
Ezio Melotti
57221d02ba Update PyUnicode_DecodeUTF8 from RFC 2279 to RFC 3629.
1) #8271: when a byte sequence is invalid, only the start byte and all the
   valid continuation bytes are now replaced by U+FFFD, instead of replacing
   the number of bytes specified by the start byte.
   See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95);
2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes
   in behavior);
3) Change the error messages "unexpected code byte" to "invalid start byte"
   and "invalid data" to "invalid continuation byte";
4) Add an extensive set of tests in test_unicode;
5) Fix test_codeccallbacks because it was failing after this change.
2010-07-01 07:32:02 +00:00
Georg Brandl
952867aa30 #9078: fix some Unicode C API descriptions, in comments and docs. 2010-06-27 10:17:12 +00:00
Ezio Melotti
415f340a0c Merged revisions 82252 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/py3k

................
  r82252 | ezio.melotti | 2010-06-26 21:50:39 +0300 (Sat, 26 Jun 2010) | 9 lines

  Merged revisions 82248 via svnmerge from
  svn+ssh://pythondev@svn.python.org/python/trunk

  ........
    r82248 | ezio.melotti | 2010-06-26 21:44:42 +0300 (Sat, 26 Jun 2010) | 1 line

    Fix extra space.
  ........
................
2010-06-26 18:52:26 +00:00
Ezio Melotti
c1897e716d Merged revisions 82248 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r82248 | ezio.melotti | 2010-06-26 21:44:42 +0300 (Sat, 26 Jun 2010) | 1 line

  Fix extra space.
........
2010-06-26 18:50:39 +00:00
Victor Stinner
554f3f0081 Issue #850997: mbcs encoding (Windows only) handles errors argument: strict
mode raises unicode errors. The encoder only supports "strict" and "replace"
error handlers, the decoder only supports "strict" and "ignore" error handlers.
2010-06-16 23:33:54 +00:00
Mark Dickinson
7db923cc99 Silence 'unused variable' gcc warning. Patch by Éric Araujo. 2010-06-12 09:10:14 +00:00
Victor Stinner
313a120ab6 Issue #8969: On Windows, use mbcs codec in strict mode to encode and decode
filenames and enable os.fsencode().
2010-06-11 23:56:51 +00:00
Antoine Pitrou
6107a688ee Merged revisions 81908 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/py3k

................
  r81908 | antoine.pitrou | 2010-06-11 23:46:32 +0200 (ven., 11 juin 2010) | 11 lines

  Merged revisions 81907 via svnmerge from
  svn+ssh://pythondev@svn.python.org/python/trunk

  ........
    r81907 | antoine.pitrou | 2010-06-11 23:42:26 +0200 (ven., 11 juin 2010) | 5 lines

    Issue #8941: decoding big endian UTF-32 data in UCS-2 builds could crash
    the interpreter with characters outside the Basic Multilingual Plane
    (higher than 0x10000).
  ........
................
2010-06-11 21:48:34 +00:00
Antoine Pitrou
cc0cfd3576 Merged revisions 81907 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r81907 | antoine.pitrou | 2010-06-11 23:42:26 +0200 (ven., 11 juin 2010) | 5 lines

  Issue #8941: decoding big endian UTF-32 data in UCS-2 builds could crash
  the interpreter with characters outside the Basic Multilingual Plane
  (higher than 0x10000).
........
2010-06-11 21:46:32 +00:00
Victor Stinner
37296e89a5 Fix r81869: ISO-8859-15 was seen as an alias to ISO-8859-1
Don't use normalize_encoding() result if it is truncated.
2010-06-10 13:36:23 +00:00
Victor Stinner
600d3bed6c Issue #8922: Normalize the encoding name in PyUnicode_AsEncodedString() to
enable shortcuts for upper case encoding name. Add also a shortcut for
"iso-8859-1" in PyUnicode_AsEncodedString() and PyUnicode_Decode().
2010-06-10 12:00:55 +00:00
Victor Stinner
ae6265f8d0 Issue #8715: Create PyUnicode_EncodeFSDefault() function: Encode a Unicode
object to Py_FileSystemDefaultEncoding with the "surrogateescape" error
handler, return a bytes object. If Py_FileSystemDefaultEncoding is not set,
fall back to UTF-8.
2010-05-15 16:27:27 +00:00
Victor Stinner
59e62db0a3 Enable shortcuts for common encodings in PyUnicode_AsEncodedString() for any
error handler, not only the default error handler (strict)
2010-05-15 13:14:32 +00:00
Victor Stinner
b9a20ad036 PyUnicode_DecodeFSDefaultAndSize() uses surrogateescape error handler
This function is only used to decode Python module filenames, but Python
doesn't support surrogates in modules filenames yet. So nobody noticed this
minor bug.
2010-04-30 16:37:52 +00:00
Victor Stinner
0ea2a468e3 Simplify PyUnicode_FSConverter(): remove reference to PyByteArray
PyByteArray is no more supported
2010-04-30 00:22:08 +00:00
Benjamin Peterson
a23831ff44 condense condition 2010-04-25 21:54:00 +00:00
Victor Stinner
0b79b76c2b Merged revisions 80384 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/py3k

........
  r80384 | victor.stinner | 2010-04-22 22:01:57 +0200 (jeu., 22 avril 2010) | 2 lines

  Fix my previous commit (r80382) for wide build (unicodeobject.c)
........
2010-04-22 20:07:28 +00:00
Victor Stinner
445a623226 Fix my previous commit (r80382) for wide build (unicodeobject.c) 2010-04-22 20:01:57 +00:00
Victor Stinner
158701d886 Merged revisions 80382 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/py3k

........
  r80382 | victor.stinner | 2010-04-22 21:38:16 +0200 (jeu., 22 avril 2010) | 3 lines

  Issue #8092: Fix PyUnicode_EncodeUTF8() to support error handler producing
  unicode string (eg. backslashreplace)
........
2010-04-22 19:41:01 +00:00
Victor Stinner
31be90b0c7 Issue #8092: Fix PyUnicode_EncodeUTF8() to support error handler producing
unicode string (eg. backslashreplace)
2010-04-22 19:38:16 +00:00
Victor Stinner
dcb2403022 Issue #8485: PyUnicode_FSConverter() doesn't accept bytearray object anymore,
you have to convert your bytearray filenames to bytes
2010-04-22 12:08:36 +00:00