Commit graph

1040 commits

Author SHA1 Message Date
Victor Stinner
ce5faf673e unicodeobject.c doesn't make output strings ready in debug mode
Try to only create non ready strings in debug mode to ensure that all functions
(not only in unicodeobject.c, everywhere) make input strings ready.
2011-10-05 00:42:43 +02:00
Georg Brandl
7597addbd4 More typoes. 2011-10-05 16:36:47 +02:00
Victor Stinner
c80d6d20d5 Speedup str[a🅱️step] for step != 1
Try to stop the scanner of the maximum character before the end using a limit
depending on the kind (e.g. 256 for PyUnicode_2BYTE_KIND).
2011-10-05 14:13:28 +02:00
Victor Stinner
ae86485517 Speedup find_maxchar_surrogates() for 32-bit wchar_t
If we have at least one character in U+10000-U+10FFFF, we know that we must use
PyUnicode_4BYTE_KIND kind.
2011-10-05 14:02:44 +02:00
Victor Stinner
b9275c104e Speedup str[a:b] and PyUnicode_FromKindAndData
* str[a:b] doesn't scan the string for the maximum character if the string
   is ascii only
 * PyUnicode_FromKindAndData() stops if we are sure that we cannot use a
   shorter character type. For example, _PyUnicode_FromUCS1() stops if we
   have at least one character in range U+0080-U+00FF
2011-10-05 14:01:42 +02:00
Victor Stinner
702c734395 Speedup the ASCII decoder
It is faster for long string and a little bit faster for short strings,
benchmark on Linux 32 bits, Intel Core i5 @ 3.33GHz:

./python -m timeit 'x=b"a"' 'x.decode("ascii")'
./python -m timeit 'x=b"x"*80' 'x.decode("ascii")'
./python -m timeit 'x=b"abc"*4096' 'x.decode("ascii")'

length |   before   | after
-------+------------+-----------
     1 | 0.234 usec | 0.229 usec
    80 | 0.381 usec | 0.357 usec
12,288 |  11.2 usec |  3.01 usec
2011-10-05 13:50:52 +02:00
Victor Stinner
e1335c711c Fix usage og PyUnicode_READY() 2011-10-04 20:53:03 +02:00
Victor Stinner
e06e145943 _PyUnicode_READY_REPLACE() cannot be used in unicode_subtype_new() 2011-10-04 20:52:31 +02:00
Victor Stinner
17efeed284 Add DONT_MAKE_RESULT_READY to unicodeobject.c to help detecting bugs
Use also _PyUnicode_READY_REPLACE() when it's applicable.
2011-10-04 20:05:46 +02:00
Victor Stinner
6b56a7fd3d Add assertion to _Py_ReleaseInternedUnicodeStrings() if READY fails 2011-10-04 20:04:52 +02:00
Antoine Pitrou
875f29bb95 Fix naïve heuristic in unicode slicing (followup to 1b4f886dc9e2) 2011-10-04 20:00:49 +02:00
Antoine Pitrou
2242522fde Add a necessary call to PyUnicode_READY() (followup to ab5086539ab9) 2011-10-04 19:10:51 +02:00
Antoine Pitrou
7aec401966 Optimize string slicing to use the new API 2011-10-04 19:08:01 +02:00
Antoine Pitrou
e19aa388e8 When expandtabs() would be a no-op, don't create a duplicate string 2011-10-04 16:04:01 +02:00
Antoine Pitrou
e71d574a39 Migrate str.expandtabs to the new API 2011-10-04 15:55:09 +02:00
Benjamin Peterson
7f3140ef80 fix parens 2011-10-03 19:37:29 -04:00
Benjamin Peterson
4bfce8f81f fix formatting 2011-10-03 19:35:07 -04:00
Benjamin Peterson
ccc51c1fc6 fix compiler warnings 2011-10-03 19:34:12 -04:00
Victor Stinner
b092365cc6 Move in-place Unicode append to its own subfunction 2011-10-04 01:17:31 +02:00
Victor Stinner
a5f9163501 Reindent internal Unicode macros 2011-10-04 01:07:11 +02:00
Victor Stinner
a41463c203 Document utf8_length and wstr_length states
Ensure these states with assertions in _PyUnicode_CheckConsistency().
2011-10-04 01:05:08 +02:00
Victor Stinner
9566311014 resize_inplace() sets utf8_length to zero if the utf8 is not shared8
Cleanup also the code.
2011-10-04 01:03:50 +02:00
Victor Stinner
9e9d689d85 PyUnicode_New() sets utf8_length to zero for latin1 2011-10-04 01:02:02 +02:00
Victor Stinner
016980454e Unicode: raise SystemError instead of ValueError or RuntimeError on invalid
state
2011-10-04 00:04:26 +02:00
Victor Stinner
7f11ad4594 Unicode: document when the wstr pointer is shared with data
Add also related assertions to _PyUnicode_CheckConsistency().
2011-10-04 00:00:20 +02:00
Victor Stinner
03490918b7 Add _PyUnicode_HAS_WSTR_MEMORY() macro 2011-10-03 23:45:12 +02:00
Victor Stinner
9ce5a835bb PyUnicode_Join() checks output length in debug mode
PyUnicode_CopyCharacters() may copies less character than requested size, if
the input string is smaller than the argument. (This is very unlikely, but who
knows!?)

Avoid also calling PyUnicode_CopyCharacters() if the string is empty.
2011-10-03 23:36:02 +02:00
Victor Stinner
b803895355 Fix a compiler warning in PyUnicode_Append()
Don't check PyUnicode_CopyCharacters() in release mode. Rename also some
variables.
2011-10-03 23:27:56 +02:00
Victor Stinner
8cfcbed4e3 Improve string forms and PyUnicode_Resize() documentation
Remove also the FIXME for resize_copy(): as discussed with Martin, copy the
string on resize if the string is not resizable is just fine.
2011-10-03 23:19:21 +02:00
Victor Stinner
77bb47b312 Simplify unicode_resizable(): singletons reference count is at least 2 2011-10-03 20:06:05 +02:00
Victor Stinner
85041a54bd _PyUnicode_CheckConsistency() checks utf8 field consistency 2011-10-03 14:42:39 +02:00
Victor Stinner
3cf4637e4e unicode_subtype_new() copies also the ascii flag 2011-10-03 14:42:15 +02:00
Victor Stinner
42dfd71333 unicode_kind_name() doesn't check consistency anymore
It is is called from _PyUnicode_Dump() and so must not fail.
2011-10-03 14:41:45 +02:00
Victor Stinner
a3b334da6d PyUnicode_Ready() now sets ascii=1 if maxchar < 128
ascii=1 is no more reserved to PyASCIIObject. Use
PyUnicode_IS_COMPACT_ASCII(obj) to check if obj is a PyASCIIObject (as before).
2011-10-03 13:53:37 +02:00
Victor Stinner
1b4f9ceca7 Create _PyUnicode_READY_REPLACE() to reuse singleton
Only use _PyUnicode_READY_REPLACE() on just created strings.
2011-10-03 13:28:14 +02:00
Victor Stinner
c379ead9af Fix resize_compact() and resize_inplace(); reenable full resize optimizations
* resize_compact() updates also wstr_len for non-ascii strings sharing wstr
 * resize_inplace() updates also utf8_len/wstr_len for strings sharing
   utf8/wstr
2011-10-03 12:52:27 +02:00
Victor Stinner
34411e17b0 resize_inplace() has been fixed: reenable this optimization 2011-10-03 12:21:33 +02:00
Victor Stinner
a849a4b6b4 _PyUnicode_Dump() indicates if wstr and/or utf8 are shared 2011-10-03 12:12:11 +02:00
Victor Stinner
1c8d0c76a1 Fix resize_inplace(): update shared utf8 pointer 2011-10-03 12:11:00 +02:00
Victor Stinner
ca4f7a4298 Disable unicode_resize() optimization on Windows (16-bit wchar_t) 2011-10-03 04:18:04 +02:00
Victor Stinner
126c559d05 _PyUnicode_Ready() for 16-bit wchar_t 2011-10-03 04:17:10 +02:00
Victor Stinner
2fd82278cb Fix compilation error on Windows
Fix also a compiler warning.
2011-10-03 04:06:05 +02:00
Victor Stinner
a3be613a56 Use PyUnicode_WCHAR_KIND to check if a string is a wstr string
Simplify the test in wstr pointer in unicode_sizeof().
2011-10-03 02:16:37 +02:00
Victor Stinner
910337b42e Add _PyUnicode_CheckConsistency() macro to help debugging
* Document Unicode string states
 * Use _PyUnicode_CheckConsistency() to ensure that objects are always
   consistent.
2011-10-03 03:20:16 +02:00
Victor Stinner
4fae54cb0e In release mode, PyUnicode_InternInPlace() does nothing if the input is NULL or
not a unicode, instead of failing with a fatal error.

Use assertions in debug mode (provide better error messages).
2011-10-03 02:01:52 +02:00
Victor Stinner
23e5668214 PyUnicode_Append() now works in-place when it's possible 2011-10-03 03:54:37 +02:00
Victor Stinner
fe226c0d37 Rewrite PyUnicode_Resize()
* Rename _PyUnicode_Resize() to unicode_resize()
 * unicode_resize() creates a copy if the string cannot be resized instead
   of failing
 * Optimize resize_copy() for wstr strings
 * Disable temporary resize_inplace()
2011-10-03 03:52:20 +02:00
Victor Stinner
829c0adca9 Add _PyUnicode_HAS_UTF8_MEMORY() macro 2011-10-03 01:08:02 +02:00
Victor Stinner
fe0c155c4f Write _PyUnicode_Dump() to help debugging 2011-10-03 02:59:31 +02:00
Victor Stinner
f42dc448e0 PyUnicode_CopyCharacters() fails when copying latin1 into ascii 2011-10-02 23:33:16 +02:00