Commit graph

314 commits

Author SHA1 Message Date
Victor Stinner
1d4b35f4e5 rephrase PyUnicode_1BYTE_KIND documentation 2011-10-06 01:51:19 +02:00
Victor Stinner
fb9ea8c57e Don't check for the maximum character when copying from unicodeobject.c
* Create copy_characters() function which doesn't check for the maximum
   character in release mode
 * _PyUnicode_CheckConsistency() is no more static to be able to use it
   in _PyUnicode_FormatAdvanced() (in formatter_unicode.c)
 * _PyUnicode_CheckConsistency() checks the string hash
2011-10-06 01:45:57 +02:00
Éric Araujo
80a348c0a0 Fix typo 2011-10-05 01:11:12 +02:00
Victor Stinner
30134f53fc Complete documentation of compact ASCII strings 2011-10-04 01:32:45 +02:00
Victor Stinner
a41463c203 Document utf8_length and wstr_length states
Ensure these states with assertions in _PyUnicode_CheckConsistency().
2011-10-04 01:05:08 +02:00
Victor Stinner
7f11ad4594 Unicode: document when the wstr pointer is shared with data
Add also related assertions to _PyUnicode_CheckConsistency().
2011-10-04 00:00:20 +02:00
Victor Stinner
8cfcbed4e3 Improve string forms and PyUnicode_Resize() documentation
Remove also the FIXME for resize_copy(): as discussed with Martin, copy the
string on resize if the string is not resizable is just fine.
2011-10-03 23:19:21 +02:00
Victor Stinner
c3cec7868b Add asciilib: similar to ucs1, ucs2 and ucs4 library, but specialized to ASCII
ucs1, ucs2 and ucs4 libraries have to scan created substring to find the
maximum character, whereas it is not need to ASCII strings. Because ASCII
strings are common, it is useful to optimize ASCII.
2011-10-05 21:24:08 +02:00
Victor Stinner
4d0d54bcba Document requierements of Unicode kinds 2011-10-05 01:31:05 +02:00
Georg Brandl
07de325672 More fixes. 2011-10-05 16:47:38 +02:00
Georg Brandl
c6bc4c6897 Fix a few typos in the unicode header. 2011-10-05 16:23:09 +02:00
Georg Brandl
4975a9b44d Fix grammar. 2011-10-05 16:12:21 +02:00
Victor Stinner
b9275c104e Speedup str[a:b] and PyUnicode_FromKindAndData
* str[a:b] doesn't scan the string for the maximum character if the string
   is ascii only
 * PyUnicode_FromKindAndData() stops if we are sure that we cannot use a
   shorter character type. For example, _PyUnicode_FromUCS1() stops if we
   have at least one character in range U+0080-U+00FF
2011-10-05 14:01:42 +02:00
Victor Stinner
85041a54bd _PyUnicode_CheckConsistency() checks utf8 field consistency 2011-10-03 14:42:39 +02:00
Victor Stinner
a3b334da6d PyUnicode_Ready() now sets ascii=1 if maxchar < 128
ascii=1 is no more reserved to PyASCIIObject. Use
PyUnicode_IS_COMPACT_ASCII(obj) to check if obj is a PyASCIIObject (as before).
2011-10-03 13:53:37 +02:00
Victor Stinner
910337b42e Add _PyUnicode_CheckConsistency() macro to help debugging
* Document Unicode string states
 * Use _PyUnicode_CheckConsistency() to ensure that objects are always
   consistent.
2011-10-03 03:20:16 +02:00
Victor Stinner
37943769ef PyUnicode_READ_CHAR() ensures that the string is ready 2011-10-02 20:33:18 +02:00
Victor Stinner
7a48ff7e06 Use Py_UCS1 instead of unsigned char in unicodeobject.h 2011-10-02 00:55:25 +02:00
Victor Stinner
cd9950fd09 PyUnicode_WriteChar() raises IndexError on invalid index
PyUnicode_WriteChar() raises also a ValueError if the string has more than 1
reference.
2011-10-02 00:34:53 +02:00
Victor Stinner
9f789e7f63 _PyUnicode_AsKind() is *not* part of the stable ABI 2011-10-01 03:57:28 +02:00
Victor Stinner
4584a5ba1a PyUnicode_CHARACTER_SIZE(): add a reference to PyUnicode_KIND_SIZE() 2011-10-01 02:39:37 +02:00
Victor Stinner
034f6cf10c Add PyUnicode_Copy() function, include it to the public API 2011-09-30 02:26:44 +02:00
Victor Stinner
d8f6510acc _PyUnicode_Ready() cannot be used on ready strings anymore
* Change its prototype: PyObject* instead of PyUnicodeoObject*.
 * Remove an old assertion, the result of PyUnicode_READY (_PyUnicode_Ready)
   must be checked instead
2011-09-29 19:43:17 +02:00
Victor Stinner
bc8b81bc4e Move _PyUnicode_UTF8() and _PyUnicode_UTF8_LENGTH() outside unicodeobject.h
Move these macros to unicodeobject.c
2011-09-29 19:31:34 +02:00
Victor Stinner
a0702ab1fe Add a note in PyUnicode_CopyCharacters() doc: it doesn't write null character
Cleanup also the code (avoid the goto).
2011-09-29 14:14:38 +02:00
Victor Stinner
f5ca1a21a5 PyUnicode_CopyCharacters() fails if 'to' has more than 1 reference 2011-09-28 23:54:59 +02:00
Victor Stinner
17222160e7 Mark _PyUnicode_FindMaxCharAndNumSurrogatePairs() as private 2011-09-28 22:15:37 +02:00
Victor Stinner
157f83fcfc Strip trailing spaces in unicodeobject.[ch] 2011-09-28 21:41:31 +02:00
Victor Stinner
be78eaf2de PyUnicode_CopyCharacters() checks for buffer and character overflow
It now returns the number of written characters on success.
2011-09-28 21:37:03 +02:00
Victor Stinner
fb5f5f2420 Mark PyUnicode_CONVERT_BYTES as private 2011-09-28 21:39:49 +02:00
Victor Stinner
5ce1b0dbc0 Set Py_UNICODE_REPLACEMENT_CHARACTER type to Py_UCS4, instead of Py_UNICODE 2011-09-28 20:29:27 +02:00
Martin v. Löwis
d63a3b8beb Implement PEP 393. 2011-09-28 07:41:54 +02:00
Victor Stinner
f955eb210f Merge 3.2: Fix PyUnicode_AsWideCharString() doc
- Fix PyUnicode_AsWideCharString() doc: size doesn't contain the null
   character
 - Fix spelling of the null character
2011-09-06 02:01:29 +02:00
Victor Stinner
d88d9836c5 Fix PyUnicode_AsWideCharString() doc: size doesn't contain the null character
Fix also spelling of the null character.
2011-09-06 02:00:05 +02:00
Ezio Melotti
8c9375bb59 #10542: Add 4 macros to work with surrogates: Py_UNICODE_IS_SURROGATE, Py_UNICODE_IS_HIGH_SURROGATE, Py_UNICODE_IS_LOW_SURROGATE, Py_UNICODE_JOIN_SURROGATES. 2011-08-22 20:03:25 +03:00
Victor Stinner
99b9538636 Issue #9642: Uniformize the tests on the availability of the mbcs codec
Add a new HAVE_MBCS define.
2011-07-04 14:23:54 +02:00
Victor Stinner
f3fd733f92 Remove useless argument of _PyUnicode_AsDefaultEncodedString() 2011-03-02 01:03:11 +00:00
Victor Stinner
0d711169fa Issue #9738: Ooops, fix typos in my previous commit (r87506) 2010-12-27 02:39:20 +00:00
Victor Stinner
dc2081f72b Issue #9738: document encodings of unicode functions 2010-12-27 01:49:29 +00:00
Georg Brandl
b550308597 Take PyUnicode_TransformDecimalToASCII out of the limited API. 2010-12-05 11:40:48 +00:00
Alexander Belopolsky
942af5a9a4 Issue #10557: Fixed error messages from float() and other numeric
types.  Added a new API function, PyUnicode_TransformDecimalToASCII(),
which transforms non-ASCII decimal digits in a Unicode string to their
ASCII equivalents.
2010-12-04 03:38:46 +00:00
Martin v. Löwis
4d0d471a80 Merge branches/pep-0384. 2010-12-03 20:14:31 +00:00
Alexander Belopolsky
83283c270a Issue #10413: Updated comments to reflect code changes 2010-11-16 14:29:01 +00:00
Victor Stinner
09f24bb408 Issue #8761: Mangle PyUnicode_CompareWithASCIIString function name for
narrow/wide unicode build.
2010-10-24 20:38:25 +00:00
Benjamin Peterson
8f67d0893f make hashes always the size of pointers; introduce Py_hash_t #9778 2010-10-17 20:54:53 +00:00
Victor Stinner
f3170ccef8 Use locale encoding if Py_FileSystemDefaultEncoding is not set
* PyUnicode_EncodeFSDefault(), PyUnicode_DecodeFSDefaultAndSize() and
   PyUnicode_DecodeFSDefault() use the locale encoding instead of UTF-8 if
   Py_FileSystemDefaultEncoding is NULL
 * redecode_filenames() functions and _Py_code_object_list (issue #9630)
   are no more needed: remove them
2010-10-15 12:04:23 +00:00
Victor Stinner
beb4135b8c PyUnicode_AsWideCharString() takes a PyObject*, not a PyUnicodeObject*
All unicode functions uses PyObject* except PyUnicode_AsWideChar(). Fix the
prototype for the new function PyUnicode_AsWideCharString().
2010-10-07 01:02:42 +00:00
Victor Stinner
137c34c027 Issue #9979: Create function PyUnicode_AsWideCharString(). 2010-09-29 10:25:54 +00:00
Amaury Forgeot d'Arc
feb7307db4 #9210: remove --with-wctype-functions configure option.
The internal unicode database is now always used.

(after 5 years: see
  http://mail.python.org/pipermail/python-dev/2004-December/050193.html
)
2010-09-12 22:42:57 +00:00
Victor Stinner
1205f2774e Issue #9738: PyUnicode_FromFormat() and PyErr_Format() raise an error on
a non-ASCII byte in the format string.

Document also the encoding.
2010-09-11 00:54:47 +00:00