Commit graph

16 commits

Author SHA1 Message Date
Hye-Shik Chang
e9ddfbb412 SF #989185: Drop unicode.iswide() and unicode.width() and add
unicodedata.east_asian_width().  You can still implement your own
simple width() function using it like this:
    def width(u):
        w = 0
        for c in unicodedata.normalize('NFC', u):
            cwidth = unicodedata.east_asian_width(c)
            if cwidth in ('W', 'F'): w += 2
            else: w += 1
        return w
2004-08-04 07:38:35 +00:00
Hye-Shik Chang
974ed7cfa5 - SF #962502: Add two more methods for unicode type; width() and
iswide() for east asian width manipulation. (Inspired by David
Goodger, Reviewed by Martin v. Loewis)
- Move _PyUnicode_TypeRecord.flags to the end of the struct so that
no padding is added for UCS-4 builds. (Suggested by Martin v. Loewis)
2004-06-02 16:49:17 +00:00
Hye-Shik Chang
7db07e6972 Fix gcc 3.3 warnings related to Py_UNICODE_WIDE. 2003-12-29 01:36:01 +00:00
Martin v. Löwis
edf368c351 Make lower/upper/title work for non-BMP characters. 2002-10-18 16:40:36 +00:00
Martin v. Löwis
9def6a3a77 Update to Unicode 3.2 database. 2002-10-18 16:11:54 +00:00
Fredrik Lundh
72b068566a removed "register const" from scalar arguments to the unicode
predicates
2001-06-27 22:08:26 +00:00
Fredrik Lundh
8f4558583f use Py_UNICODE_WIDE instead of USE_UCS4_STORAGE and Py_UNICODE_SIZE
tests.
2001-06-27 18:59:43 +00:00
Martin v. Löwis
ce9b5a55e1 Encode surrogates in UTF-8 even for a wide Py_UNICODE.
Implement sys.maxunicode.
Explicitly wrap around upper/lower computations for wide Py_UNICODE.
When decoding large characters with UTF-8, represent expected test
results using the \U notation.
2001-06-27 06:28:56 +00:00
Fredrik Lundh
ee13dba1aa more unicode tweaks: fix unicodectype for sizeof(Py_UNICODE) >
sizeof(int)
2001-06-26 20:36:12 +00:00
Fredrik Lundh
9e7dd4c185 unicode database compression, step 3:
- use unidb compression for the unicodectype module.  smaller, faster,
  and slightly more portable...
2000-09-25 21:48:13 +00:00
Trent Mick
8a74e5fc2c Add the current Win64 compiler to the list of those that need the
huge switch statement broken up. This will probably not be necessary when
the Win64 compiler matures.
2000-08-12 19:37:27 +00:00
Guido van Rossum
16b1ad9c7d Changing the CNRI copyright notice according to CNRI's instructions.
This is a notice without a date, which apparently is not a claim to
copyright but only advice to the reader.  IANAL. :-)
2000-08-03 16:24:25 +00:00
Jack Jansen
56cdce3070 Conditionally (currently on ifdef macintosh) break the large switch up
into 1000-case smaller ones.
2000-07-06 13:57:38 +00:00
Marc-André Lemburg
f3938f55c7 Added new lookup API which matches all alphabetic Unicode characters,
i.e the ones with category 'Ll','Lu','Lt','Lo','Lm'.
2000-07-05 09:48:59 +00:00
Guido van Rossum
dc742b3184 Marc-Andre Lemburg:
Added a few missing whitespace Unicode char mappings.
Thanks to Brian Hooper.
2000-04-11 15:39:02 +00:00
Guido van Rossum
603484d759 Unicode character type helpers, written by Marc-Andre Lemburg. 2000-03-10 22:52:46 +00:00