mirror of
				https://github.com/python/cpython.git
				synced 2025-10-25 15:58:57 +00:00 
			
		
		
		
	 b044b2a701
			
		
	
	
		b044b2a701
		
	
	
	
	
		
			
			svn+ssh://svn.python.org/python/branches/py3k ................ r74821 | georg.brandl | 2009-09-16 11:42:19 +0200 (Mi, 16 Sep 2009) | 1 line #6885: run python 3 as python3. ................ r74828 | georg.brandl | 2009-09-16 16:23:20 +0200 (Mi, 16 Sep 2009) | 1 line Use true booleans. ................ r74829 | georg.brandl | 2009-09-16 16:24:29 +0200 (Mi, 16 Sep 2009) | 1 line Small PEP8 correction. ................ r74830 | georg.brandl | 2009-09-16 16:36:22 +0200 (Mi, 16 Sep 2009) | 1 line Use true booleans. ................ r74831 | georg.brandl | 2009-09-16 17:54:04 +0200 (Mi, 16 Sep 2009) | 1 line Use true booleans and PEP8 for argdefaults. ................ r74833 | georg.brandl | 2009-09-16 17:58:14 +0200 (Mi, 16 Sep 2009) | 1 line Last round of adapting style of documenting argument default values. ................ r74835 | georg.brandl | 2009-09-16 18:00:31 +0200 (Mi, 16 Sep 2009) | 33 lines Merged revisions 74817-74820,74822-74824 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r74817 | georg.brandl | 2009-09-16 11:05:11 +0200 (Mi, 16 Sep 2009) | 1 line Make deprecation notices as visible as warnings are right now. ........ r74818 | georg.brandl | 2009-09-16 11:23:04 +0200 (Mi, 16 Sep 2009) | 1 line #6880: add reference to classes section in exceptions section, which comes earlier. ........ r74819 | georg.brandl | 2009-09-16 11:24:57 +0200 (Mi, 16 Sep 2009) | 1 line #6876: fix base class constructor invocation in example. ........ r74820 | georg.brandl | 2009-09-16 11:30:48 +0200 (Mi, 16 Sep 2009) | 1 line #6891: comment out dead link to Unicode article. ........ r74822 | georg.brandl | 2009-09-16 12:12:06 +0200 (Mi, 16 Sep 2009) | 1 line #5621: refactor description of how class/instance attributes interact on a.x=a.x+1 or augassign. ........ r74823 | georg.brandl | 2009-09-16 15:06:22 +0200 (Mi, 16 Sep 2009) | 1 line Remove strange trailing commas. ........ r74824 | georg.brandl | 2009-09-16 15:11:06 +0200 (Mi, 16 Sep 2009) | 1 line #6892: fix optparse example involving help option. ........ ................
		
			
				
	
	
		
			161 lines
		
	
	
	
		
			5.3 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			161 lines
		
	
	
	
		
			5.3 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| :mod:`unicodedata` --- Unicode Database
 | |
| =======================================
 | |
| 
 | |
| .. module:: unicodedata
 | |
|    :synopsis: Access the Unicode Database.
 | |
| .. moduleauthor:: Marc-Andre Lemburg <mal@lemburg.com>
 | |
| .. sectionauthor:: Marc-Andre Lemburg <mal@lemburg.com>
 | |
| .. sectionauthor:: Martin v. Löwis <martin@v.loewis.de>
 | |
| 
 | |
| 
 | |
| .. index::
 | |
|    single: Unicode
 | |
|    single: character
 | |
|    pair: Unicode; database
 | |
| 
 | |
| This module provides access to the Unicode Character Database which defines
 | |
| character properties for all Unicode characters. The data in this database is
 | |
| based on the :file:`UnicodeData.txt` file version 5.1.0 which is publicly
 | |
| available from ftp://ftp.unicode.org/.
 | |
| 
 | |
| The module uses the same names and symbols as defined by the UnicodeData File
 | |
| Format 5.1.0 (see http://www.unicode.org/Public/5.1.0/ucd/UCD.html).  It defines
 | |
| the following functions:
 | |
| 
 | |
| 
 | |
| .. function:: lookup(name)
 | |
| 
 | |
|    Look up character by name.  If a character with the given name is found, return
 | |
|    the corresponding character.  If not found, :exc:`KeyError` is raised.
 | |
| 
 | |
| 
 | |
| .. function:: name(chr[, default])
 | |
| 
 | |
|    Returns the name assigned to the character *chr* as a string. If no
 | |
|    name is defined, *default* is returned, or, if not given, :exc:`ValueError` is
 | |
|    raised.
 | |
| 
 | |
| 
 | |
| .. function:: decimal(chr[, default])
 | |
| 
 | |
|    Returns the decimal value assigned to the character *chr* as integer.
 | |
|    If no such value is defined, *default* is returned, or, if not given,
 | |
|    :exc:`ValueError` is raised.
 | |
| 
 | |
| 
 | |
| .. function:: digit(chr[, default])
 | |
| 
 | |
|    Returns the digit value assigned to the character *chr* as integer.
 | |
|    If no such value is defined, *default* is returned, or, if not given,
 | |
|    :exc:`ValueError` is raised.
 | |
| 
 | |
| 
 | |
| .. function:: numeric(chr[, default])
 | |
| 
 | |
|    Returns the numeric value assigned to the character *chr* as float.
 | |
|    If no such value is defined, *default* is returned, or, if not given,
 | |
|    :exc:`ValueError` is raised.
 | |
| 
 | |
| 
 | |
| .. function:: category(chr)
 | |
| 
 | |
|    Returns the general category assigned to the character *chr* as
 | |
|    string.
 | |
| 
 | |
| 
 | |
| .. function:: bidirectional(chr)
 | |
| 
 | |
|    Returns the bidirectional category assigned to the character *chr* as
 | |
|    string. If no such value is defined, an empty string is returned.
 | |
| 
 | |
| 
 | |
| .. function:: combining(chr)
 | |
| 
 | |
|    Returns the canonical combining class assigned to the character *chr*
 | |
|    as integer. Returns ``0`` if no combining class is defined.
 | |
| 
 | |
| 
 | |
| .. function:: east_asian_width(chr)
 | |
| 
 | |
|    Returns the east asian width assigned to the character *chr* as
 | |
|    string.
 | |
| 
 | |
| 
 | |
| .. function:: mirrored(chr)
 | |
| 
 | |
|    Returns the mirrored property assigned to the character *chr* as
 | |
|    integer. Returns ``1`` if the character has been identified as a "mirrored"
 | |
|    character in bidirectional text, ``0`` otherwise.
 | |
| 
 | |
| 
 | |
| .. function:: decomposition(chr)
 | |
| 
 | |
|    Returns the character decomposition mapping assigned to the character
 | |
|    *chr* as string. An empty string is returned in case no such mapping is
 | |
|    defined.
 | |
| 
 | |
| 
 | |
| .. function:: normalize(form, unistr)
 | |
| 
 | |
|    Return the normal form *form* for the Unicode string *unistr*. Valid values for
 | |
|    *form* are 'NFC', 'NFKC', 'NFD', and 'NFKD'.
 | |
| 
 | |
|    The Unicode standard defines various normalization forms of a Unicode string,
 | |
|    based on the definition of canonical equivalence and compatibility equivalence.
 | |
|    In Unicode, several characters can be expressed in various way. For example, the
 | |
|    character U+00C7 (LATIN CAPITAL LETTER C WITH CEDILLA) can also be expressed as
 | |
|    the sequence U+0327 (COMBINING CEDILLA) U+0043 (LATIN CAPITAL LETTER C).
 | |
| 
 | |
|    For each character, there are two normal forms: normal form C and normal form D.
 | |
|    Normal form D (NFD) is also known as canonical decomposition, and translates
 | |
|    each character into its decomposed form. Normal form C (NFC) first applies a
 | |
|    canonical decomposition, then composes pre-combined characters again.
 | |
| 
 | |
|    In addition to these two forms, there are two additional normal forms based on
 | |
|    compatibility equivalence. In Unicode, certain characters are supported which
 | |
|    normally would be unified with other characters. For example, U+2160 (ROMAN
 | |
|    NUMERAL ONE) is really the same thing as U+0049 (LATIN CAPITAL LETTER I).
 | |
|    However, it is supported in Unicode for compatibility with existing character
 | |
|    sets (e.g. gb2312).
 | |
| 
 | |
|    The normal form KD (NFKD) will apply the compatibility decomposition, i.e.
 | |
|    replace all compatibility characters with their equivalents. The normal form KC
 | |
|    (NFKC) first applies the compatibility decomposition, followed by the canonical
 | |
|    composition.
 | |
| 
 | |
|    Even if two unicode strings are normalized and look the same to
 | |
|    a human reader, if one has combining characters and the other
 | |
|    doesn't, they may not compare equal.
 | |
| 
 | |
| 
 | |
| In addition, the module exposes the following constant:
 | |
| 
 | |
| .. data:: unidata_version
 | |
| 
 | |
|    The version of the Unicode database used in this module.
 | |
| 
 | |
| 
 | |
| .. data:: ucd_3_2_0
 | |
| 
 | |
|    This is an object that has the same methods as the entire module, but uses the
 | |
|    Unicode database version 3.2 instead, for applications that require this
 | |
|    specific version of the Unicode database (such as IDNA).
 | |
| 
 | |
| Examples:
 | |
| 
 | |
|    >>> import unicodedata
 | |
|    >>> unicodedata.lookup('LEFT CURLY BRACKET')
 | |
|    '{'
 | |
|    >>> unicodedata.name('/')
 | |
|    'SOLIDUS'
 | |
|    >>> unicodedata.decimal('9')
 | |
|    9
 | |
|    >>> unicodedata.decimal('a')
 | |
|    Traceback (most recent call last):
 | |
|      File "<stdin>", line 1, in ?
 | |
|    ValueError: not a decimal
 | |
|    >>> unicodedata.category('A')  # 'L'etter, 'u'ppercase
 | |
|    'Lu'
 | |
|    >>> unicodedata.bidirectional('\u0660') # 'A'rabic, 'N'umber
 | |
|    'AN'
 | |
| 
 |