Commit graph

1739 commits

Author SHA1 Message Date
Georg Brandl
7597addbd4 More typoes. 2011-10-05 16:36:47 +02:00
Victor Stinner
c80d6d20d5 Speedup str[a🅱️step] for step != 1
Try to stop the scanner of the maximum character before the end using a limit
depending on the kind (e.g. 256 for PyUnicode_2BYTE_KIND).
2011-10-05 14:13:28 +02:00
Victor Stinner
ae86485517 Speedup find_maxchar_surrogates() for 32-bit wchar_t
If we have at least one character in U+10000-U+10FFFF, we know that we must use
PyUnicode_4BYTE_KIND kind.
2011-10-05 14:02:44 +02:00
Victor Stinner
b9275c104e Speedup str[a:b] and PyUnicode_FromKindAndData
* str[a:b] doesn't scan the string for the maximum character if the string
   is ascii only
 * PyUnicode_FromKindAndData() stops if we are sure that we cannot use a
   shorter character type. For example, _PyUnicode_FromUCS1() stops if we
   have at least one character in range U+0080-U+00FF
2011-10-05 14:01:42 +02:00
Victor Stinner
702c734395 Speedup the ASCII decoder
It is faster for long string and a little bit faster for short strings,
benchmark on Linux 32 bits, Intel Core i5 @ 3.33GHz:

./python -m timeit 'x=b"a"' 'x.decode("ascii")'
./python -m timeit 'x=b"x"*80' 'x.decode("ascii")'
./python -m timeit 'x=b"abc"*4096' 'x.decode("ascii")'

length |   before   | after
-------+------------+-----------
     1 | 0.234 usec | 0.229 usec
    80 | 0.381 usec | 0.357 usec
12,288 |  11.2 usec |  3.01 usec
2011-10-05 13:50:52 +02:00
Victor Stinner
e1335c711c Fix usage og PyUnicode_READY() 2011-10-04 20:53:03 +02:00
Victor Stinner
e06e145943 _PyUnicode_READY_REPLACE() cannot be used in unicode_subtype_new() 2011-10-04 20:52:31 +02:00
Victor Stinner
17efeed284 Add DONT_MAKE_RESULT_READY to unicodeobject.c to help detecting bugs
Use also _PyUnicode_READY_REPLACE() when it's applicable.
2011-10-04 20:05:46 +02:00
Victor Stinner
6b56a7fd3d Add assertion to _Py_ReleaseInternedUnicodeStrings() if READY fails 2011-10-04 20:04:52 +02:00
Antoine Pitrou
875f29bb95 Fix naïve heuristic in unicode slicing (followup to 1b4f886dc9e2) 2011-10-04 20:00:49 +02:00
Antoine Pitrou
2242522fde Add a necessary call to PyUnicode_READY() (followup to ab5086539ab9) 2011-10-04 19:10:51 +02:00
Antoine Pitrou
7aec401966 Optimize string slicing to use the new API 2011-10-04 19:08:01 +02:00
Antoine Pitrou
e19aa388e8 When expandtabs() would be a no-op, don't create a duplicate string 2011-10-04 16:04:01 +02:00
Antoine Pitrou
e71d574a39 Migrate str.expandtabs to the new API 2011-10-04 15:55:09 +02:00
Benjamin Peterson
7f3140ef80 fix parens 2011-10-03 19:37:29 -04:00
Benjamin Peterson
4bfce8f81f fix formatting 2011-10-03 19:35:07 -04:00
Benjamin Peterson
ccc51c1fc6 fix compiler warnings 2011-10-03 19:34:12 -04:00
Victor Stinner
b092365cc6 Move in-place Unicode append to its own subfunction 2011-10-04 01:17:31 +02:00
Victor Stinner
a5f9163501 Reindent internal Unicode macros 2011-10-04 01:07:11 +02:00
Victor Stinner
a41463c203 Document utf8_length and wstr_length states
Ensure these states with assertions in _PyUnicode_CheckConsistency().
2011-10-04 01:05:08 +02:00
Victor Stinner
9566311014 resize_inplace() sets utf8_length to zero if the utf8 is not shared8
Cleanup also the code.
2011-10-04 01:03:50 +02:00
Victor Stinner
9e9d689d85 PyUnicode_New() sets utf8_length to zero for latin1 2011-10-04 01:02:02 +02:00
Victor Stinner
016980454e Unicode: raise SystemError instead of ValueError or RuntimeError on invalid
state
2011-10-04 00:04:26 +02:00
Victor Stinner
7f11ad4594 Unicode: document when the wstr pointer is shared with data
Add also related assertions to _PyUnicode_CheckConsistency().
2011-10-04 00:00:20 +02:00
Victor Stinner
03490918b7 Add _PyUnicode_HAS_WSTR_MEMORY() macro 2011-10-03 23:45:12 +02:00
Victor Stinner
9ce5a835bb PyUnicode_Join() checks output length in debug mode
PyUnicode_CopyCharacters() may copies less character than requested size, if
the input string is smaller than the argument. (This is very unlikely, but who
knows!?)

Avoid also calling PyUnicode_CopyCharacters() if the string is empty.
2011-10-03 23:36:02 +02:00
Victor Stinner
b803895355 Fix a compiler warning in PyUnicode_Append()
Don't check PyUnicode_CopyCharacters() in release mode. Rename also some
variables.
2011-10-03 23:27:56 +02:00
Victor Stinner
8cfcbed4e3 Improve string forms and PyUnicode_Resize() documentation
Remove also the FIXME for resize_copy(): as discussed with Martin, copy the
string on resize if the string is not resizable is just fine.
2011-10-03 23:19:21 +02:00
Victor Stinner
77bb47b312 Simplify unicode_resizable(): singletons reference count is at least 2 2011-10-03 20:06:05 +02:00
Victor Stinner
85041a54bd _PyUnicode_CheckConsistency() checks utf8 field consistency 2011-10-03 14:42:39 +02:00
Victor Stinner
3cf4637e4e unicode_subtype_new() copies also the ascii flag 2011-10-03 14:42:15 +02:00
Victor Stinner
42dfd71333 unicode_kind_name() doesn't check consistency anymore
It is is called from _PyUnicode_Dump() and so must not fail.
2011-10-03 14:41:45 +02:00
Victor Stinner
a3b334da6d PyUnicode_Ready() now sets ascii=1 if maxchar < 128
ascii=1 is no more reserved to PyASCIIObject. Use
PyUnicode_IS_COMPACT_ASCII(obj) to check if obj is a PyASCIIObject (as before).
2011-10-03 13:53:37 +02:00
Victor Stinner
1b4f9ceca7 Create _PyUnicode_READY_REPLACE() to reuse singleton
Only use _PyUnicode_READY_REPLACE() on just created strings.
2011-10-03 13:28:14 +02:00
Victor Stinner
c379ead9af Fix resize_compact() and resize_inplace(); reenable full resize optimizations
* resize_compact() updates also wstr_len for non-ascii strings sharing wstr
 * resize_inplace() updates also utf8_len/wstr_len for strings sharing
   utf8/wstr
2011-10-03 12:52:27 +02:00
Victor Stinner
34411e17b0 resize_inplace() has been fixed: reenable this optimization 2011-10-03 12:21:33 +02:00
Victor Stinner
a849a4b6b4 _PyUnicode_Dump() indicates if wstr and/or utf8 are shared 2011-10-03 12:12:11 +02:00
Victor Stinner
1c8d0c76a1 Fix resize_inplace(): update shared utf8 pointer 2011-10-03 12:11:00 +02:00
Victor Stinner
ca4f7a4298 Disable unicode_resize() optimization on Windows (16-bit wchar_t) 2011-10-03 04:18:04 +02:00
Victor Stinner
126c559d05 _PyUnicode_Ready() for 16-bit wchar_t 2011-10-03 04:17:10 +02:00
Victor Stinner
2fd82278cb Fix compilation error on Windows
Fix also a compiler warning.
2011-10-03 04:06:05 +02:00
Victor Stinner
a3be613a56 Use PyUnicode_WCHAR_KIND to check if a string is a wstr string
Simplify the test in wstr pointer in unicode_sizeof().
2011-10-03 02:16:37 +02:00
Victor Stinner
910337b42e Add _PyUnicode_CheckConsistency() macro to help debugging
* Document Unicode string states
 * Use _PyUnicode_CheckConsistency() to ensure that objects are always
   consistent.
2011-10-03 03:20:16 +02:00
Victor Stinner
4fae54cb0e In release mode, PyUnicode_InternInPlace() does nothing if the input is NULL or
not a unicode, instead of failing with a fatal error.

Use assertions in debug mode (provide better error messages).
2011-10-03 02:01:52 +02:00
Victor Stinner
23e5668214 PyUnicode_Append() now works in-place when it's possible 2011-10-03 03:54:37 +02:00
Victor Stinner
fe226c0d37 Rewrite PyUnicode_Resize()
* Rename _PyUnicode_Resize() to unicode_resize()
 * unicode_resize() creates a copy if the string cannot be resized instead
   of failing
 * Optimize resize_copy() for wstr strings
 * Disable temporary resize_inplace()
2011-10-03 03:52:20 +02:00
Victor Stinner
829c0adca9 Add _PyUnicode_HAS_UTF8_MEMORY() macro 2011-10-03 01:08:02 +02:00
Victor Stinner
fe0c155c4f Write _PyUnicode_Dump() to help debugging 2011-10-03 02:59:31 +02:00
Victor Stinner
f42dc448e0 PyUnicode_CopyCharacters() fails when copying latin1 into ascii 2011-10-02 23:33:16 +02:00
Victor Stinner
c53be96c54 unicode_convert_wchar_to_ucs4() cannot fail 2011-10-02 21:33:54 +02:00