Commit graph

311 commits

Author SHA1 Message Date
Victor Stinner
d3f0882dfb Issue #14744: Use the new _PyUnicodeWriter internal API to speed up str%args and str.format(args)
* Formatting string, int, float and complex use the _PyUnicodeWriter API. It
   avoids a temporary buffer in most cases.
 * Add _PyUnicodeWriter_WriteStr() to restore the PyAccu optimization: just
   keep a reference to the string if the output is only composed of one string
 * Disable overallocation when formatting the last argument of str%args and
   str.format(args)
 * Overallocation allocates at least 100 characters: add min_length attribute
   to the _PyUnicodeWriter structure
 * Add new private functions: _PyUnicode_FastCopyCharacters(),
   _PyUnicode_FastFill() and _PyUnicode_FromASCII()

The speed up is around 20% in average.
2012-05-29 12:57:52 +02:00
Victor Stinner
ece58deb9f Close #14648: Compute correctly maxchar in str.format() for substrin 2012-04-23 23:36:38 +02:00
Victor Stinner
c9590ad745 Close #14085: remove assertions from PyUnicode_WRITE macro
Add checks in PyUnicode_WriteChar() and convert PyUnicode_New() assertion to a
test raising a Python exception.
2012-03-04 01:34:37 +01:00
Victor Stinner
41a863cb81 Issue #13706: Fix format(int, "n") for locale with non-ASCII thousands separator
* Decode thousands separator and decimal point using PyUnicode_DecodeLocale()
   (from the locale encoding), instead of decoding them implicitly from latin1
 * Remove _PyUnicode_InsertThousandsGroupingLocale(), it was not used
 * Change _PyUnicode_InsertThousandsGrouping() API to return the maximum
   character if unicode is NULL
 * Replace MIN/MAX macros by Py_MIN/Py_MAX
 * stringlib/undef.h undefines STRINGLIB_IS_UNICODE
 * stringlib/localeutil.h only supports Unicode
2012-02-24 00:37:51 +01:00
Victor Stinner
ed27785b32 Issue #13706: Add assertions to detect bugs earlier 2012-02-01 00:22:23 +01:00
Antoine Pitrou
7ab4af0427 Issue #13848: open() and the FileIO constructor now check for NUL characters in the file name.
Patch by Hynek Schlawack.
2012-01-29 18:43:36 +01:00
Antoine Pitrou
1334884ff2 Issue #13848: open() and the FileIO constructor now check for NUL characters in the file name.
Patch by Hynek Schlawack.
2012-01-29 18:36:34 +01:00
Benjamin Peterson
ce79852077 use the static identifier api for looking up special methods
I had to move the static identifier code from unicodeobject.h to object.h in
order for this to work.
2012-01-22 11:24:29 -05:00
Benjamin Peterson
d5890c8db5 add str.casefold() (closes #13752) 2012-01-14 13:23:30 -05:00
Amaury Forgeot d'Arc
77b1ecf0ad Silence compilation warnings on Windows 2012-01-13 22:12:37 +01:00
Benjamin Peterson
b2bf01d824 use full unicode mappings for upper/lower/title case (#12736)
Also broaden the category of characters that count as lowercase/uppercase.
2012-01-11 18:17:06 -05:00
Victor Stinner
3fe553160c Add a new PyUnicode_Fill() function
It is faster than the unicode_fill() function which was implemented in
formatter_unicode.c.
2012-01-04 00:33:50 +01:00
Victor Stinner
80bc72d5a2 fix PyCompactUnicodeObject doc (test) 2011-12-22 03:23:10 +01:00
Victor Stinner
52e2cc8604 backout 7876cd49300d: Move PyUnicode_WCHAR_KIND outside PyUnicode_Kind enum 2011-12-19 22:14:45 +01:00
Victor Stinner
0ba5af20c0 Move PyUnicode_WCHAR_KIND outside PyUnicode_Kind enum 2011-12-17 22:18:27 +01:00
Victor Stinner
1b57967b96 Issue #13560: Locale codec functions use the classic "errors" parameter,
instead of surrogateescape

So it would be possible to support more error handlers later.
2011-12-17 05:47:23 +01:00
Victor Stinner
f2ea71fcc8 Issue #13560: Add PyUnicode_EncodeLocale()
* Use PyUnicode_EncodeLocale() in time.strftime() if wcsftime() is not
   available
 * Document my last changes in Misc/NEWS
2011-12-17 04:13:41 +01:00
Victor Stinner
af02e1c85a Add PyUnicode_DecodeLocaleAndSize() and PyUnicode_DecodeLocale()
* PyUnicode_DecodeLocaleAndSize() and PyUnicode_DecodeLocale() decode a string
   from the current locale encoding
 * _Py_char2wchar() writes an "error code" in the size argument to indicate
   if the function failed because of memory allocation failure or because of a
   decoding error. The function doesn't write the error message directly to
   stderr.
 * Fix time.strftime() (if wcsftime() is missing): decode strftime() result
   from the current locale encoding, not from the filesystem encoding.
2011-12-16 23:56:01 +01:00
Victor Stinner
16e6a80923 PyUnicode_Resize(): warn about canonical representation
Call also directly unicode_resize() in unicodeobject.c
2011-12-12 13:24:15 +01:00
Victor Stinner
b0a82a6a7f Fix PyUnicode_Resize() for compact string: leave the string unchanged on error
Fix also PyUnicode_Resize() doc
2011-12-12 13:08:33 +01:00
Victor Stinner
bf6e560d0c Make PyUnicode_Copy() private => _PyUnicode_Copy()
Undocument the function.

Make also decode_utf8_errors() as private (static).
2011-12-12 01:53:47 +01:00
Victor Stinner
7a9105a380 resize_copy() now supports legacy ready strings 2011-12-12 00:13:42 +01:00
Victor Stinner
24c74be9a3 PyUnicode_IS_ASCII() macro ensures that the string is ready
It has no sense to check if a not ready string is ASCII or not.
2011-12-12 01:24:20 +01:00
Victor Stinner
551ac95733 Py_UNICODE_HIGH_SURROGATE() and Py_UNICODE_LOW_SURROGATE() macros
And use surrogates macros everywhere in unicodeobject.c
2011-11-29 22:58:13 +01:00
Victor Stinner
f3ae6208c7 PyUnicode_GET_SIZE() checks that PyUnicode_AsUnicode() succeed
using an assertion
2011-11-21 02:24:49 +01:00
Victor Stinner
77faf69ca1 _PyUnicode_CheckConsistency() also checks maxchar maximum value,
not only its minimum value
2011-11-20 18:56:05 +01:00
Victor Stinner
9343999597 Fix PyUnicode_CopyCharacters() doc 2011-11-20 18:29:14 +01:00
Victor Stinner
7c8bbbbb0c Ensure that Py_UCS4 is 32 bits and Py_UCS2 is 16 bits 2011-11-20 18:28:29 +01:00
Victor Stinner
6f9568bb1f Fix misused of "PyUnicodeObject" structure name in unicodeobject.h 2011-11-17 00:12:44 +01:00
Martin v. Löwis
1db7c13be1 Port encoders from Py_UNICODE API to unicode object API. 2011-11-10 18:24:32 +01:00
Martin v. Löwis
d10759f6ed Make _PyUnicode_FromId return borrowed references.
http://mail.python.org/pipermail/python-dev/2011-November/114347.html
2011-11-07 13:00:05 +01:00
Victor Stinner
e30c0a1014 Fix gdb/libpython.py for not ready Unicode strings
_PyUnicode_CheckConsistency() checks also hash and length value for not ready
Unicode strings.
2011-11-04 20:54:05 +01:00
Victor Stinner
7931d9a951 Replace PyUnicodeObject type by PyObject
* _PyUnicode_CheckConsistency() now takes a PyObject* instead of void*
 * Remove now useless casts to PyObject*
2011-11-04 00:22:48 +01:00
Martin v. Löwis
23e275b3ad Port UCS1 and charmap codecs to new API. 2011-11-02 18:02:51 +01:00
Martin v. Löwis
0d3072e98d Drop Py_UCS4_ functions. Closes #13246. 2011-10-31 08:40:56 +01:00
Victor Stinner
9db1a8b69f Replace PyUnicodeObject* by PyObject* where it was irrevelant
A Unicode string can now be a PyASCIIObject, PyCompactUnicodeObject or
PyUnicodeObject. Aliasing a PyASCIIObject* or PyCompactUnicodeObject* to
PyUnicodeObject* is wrong
2011-10-23 20:04:37 +02:00
Victor Stinner
55c7e00fc0 Simplify _PyUnicode_COMPACT_DATA() macro 2011-10-18 23:32:53 +02:00
Victor Stinner
3a50e7056e Issue #12281: Rewrite the MBCS codec to handle correctly replace and ignore
error handlers on all Windows versions. The MBCS codec is now supporting all
error handlers, instead of only replace to encode and ignore to decode.
2011-10-18 21:21:00 +02:00
Martin v. Löwis
bd928fef42 Rename _Py_identifier to _Py_IDENTIFIER. 2011-10-14 10:20:37 +02:00
Victor Stinner
8813104e53 Simplify PyUnicode_MAX_CHAR_VALUE
Use PyUnicode_IS_ASCII instead of PyUnicode_IS_COMPACT_ASCII, so the following
test can be removed:

   PyUnicode_DATA(op) == (((PyCompactUnicodeObject *)(op))->utf8)
2011-10-13 01:12:01 +02:00
Martin v. Löwis
87da872c69 Drop extra semicolon. 2011-10-09 11:54:42 +02:00
Martin v. Löwis
afe55bba33 Add API for static strings, primarily good for identifiers.
Thanks to Konrad Schöbel and Jasper Schulz for helping with the mass-editing.
2011-10-09 10:38:36 +02:00
Martin v. Löwis
c47adb04b3 Change PyUnicode_KIND to 1,2,4. Drop _KIND_SIZE and _CHARACTER_SIZE. 2011-10-07 20:55:35 +02:00
Georg Brandl
db6c7f5c33 Update C API docs for PEP 393. 2011-10-07 11:19:11 +02:00
Victor Stinner
b066cc6aba Fix PyUnicode_CHARACTER_SIZE and PyUnicode_KIND_SIZE 2011-10-06 15:54:53 +02:00
Antoine Pitrou
dbf697ae5c Fix compilation warnings under 64-bit Windows 2011-10-06 15:34:41 +02:00
Éric Araujo
0f4ee93b06 Branch merge 2011-10-06 13:22:21 +02:00
Victor Stinner
1d4b35f4e5 rephrase PyUnicode_1BYTE_KIND documentation 2011-10-06 01:51:19 +02:00
Victor Stinner
fb9ea8c57e Don't check for the maximum character when copying from unicodeobject.c
* Create copy_characters() function which doesn't check for the maximum
   character in release mode
 * _PyUnicode_CheckConsistency() is no more static to be able to use it
   in _PyUnicode_FormatAdvanced() (in formatter_unicode.c)
 * _PyUnicode_CheckConsistency() checks the string hash
2011-10-06 01:45:57 +02:00
Éric Araujo
80a348c0a0 Fix typo 2011-10-05 01:11:12 +02:00