mirror of
https://github.com/python/cpython.git
synced 2025-08-04 08:59:19 +00:00
Update docs w.r.t. PEP 3100 changes -- patch for GHOP by Dan Finnie.
This commit is contained in:
parent
f25ef50549
commit
f694518331
48 changed files with 395 additions and 534 deletions
|
@ -314,7 +314,7 @@ this::
|
|||
Sets can take their contents from an iterable and let you iterate over the set's
|
||||
elements::
|
||||
|
||||
S = set((2, 3, 5, 7, 11, 13))
|
||||
S = {2, 3, 5, 7, 11, 13}
|
||||
for i in S:
|
||||
print(i)
|
||||
|
||||
|
@ -616,29 +616,26 @@ Built-in functions
|
|||
|
||||
Let's look in more detail at built-in functions often used with iterators.
|
||||
|
||||
Two of Python's built-in functions, :func:`map` and :func:`filter`, are somewhat
|
||||
obsolete; they duplicate the features of list comprehensions but return actual
|
||||
lists instead of iterators.
|
||||
Two of Python's built-in functions, :func:`map` and :func:`filter` duplicate the
|
||||
features of generator expressions:
|
||||
|
||||
``map(f, iterA, iterB, ...)`` returns a list containing ``f(iterA[0], iterB[0]),
|
||||
f(iterA[1], iterB[1]), f(iterA[2], iterB[2]), ...``.
|
||||
``map(f, iterA, iterB, ...)`` returns an iterator over the sequence
|
||||
``f(iterA[0], iterB[0]), f(iterA[1], iterB[1]), f(iterA[2], iterB[2]), ...``.
|
||||
|
||||
::
|
||||
|
||||
def upper(s):
|
||||
return s.upper()
|
||||
map(upper, ['sentence', 'fragment']) =>
|
||||
list(map(upper, ['sentence', 'fragment'])) =>
|
||||
['SENTENCE', 'FRAGMENT']
|
||||
|
||||
[upper(s) for s in ['sentence', 'fragment']] =>
|
||||
list(upper(s) for s in ['sentence', 'fragment']) =>
|
||||
['SENTENCE', 'FRAGMENT']
|
||||
|
||||
As shown above, you can achieve the same effect with a list comprehension. The
|
||||
:func:`itertools.imap` function does the same thing but can handle infinite
|
||||
iterators; it'll be discussed later, in the section on the :mod:`itertools` module.
|
||||
You can of course achieve the same effect with a list comprehension.
|
||||
|
||||
``filter(predicate, iter)`` returns a list that contains all the sequence
|
||||
elements that meet a certain condition, and is similarly duplicated by list
|
||||
``filter(predicate, iter)`` returns an iterator over all the sequence elements
|
||||
that meet a certain condition, and is similarly duplicated by list
|
||||
comprehensions. A **predicate** is a function that returns the truth value of
|
||||
some condition; for use with :func:`filter`, the predicate must take a single
|
||||
value.
|
||||
|
@ -648,69 +645,61 @@ value.
|
|||
def is_even(x):
|
||||
return (x % 2) == 0
|
||||
|
||||
filter(is_even, range(10)) =>
|
||||
list(filter(is_even, range(10))) =>
|
||||
[0, 2, 4, 6, 8]
|
||||
|
||||
This can also be written as a list comprehension::
|
||||
This can also be written as a generator expression::
|
||||
|
||||
>>> [x for x in range(10) if is_even(x)]
|
||||
>>> list(x for x in range(10) if is_even(x))
|
||||
[0, 2, 4, 6, 8]
|
||||
|
||||
:func:`filter` also has a counterpart in the :mod:`itertools` module,
|
||||
:func:`itertools.ifilter`, that returns an iterator and can therefore handle
|
||||
infinite sequences just as :func:`itertools.imap` can.
|
||||
``functools.reduce(func, iter, [initial_value])`` cumulatively performs an
|
||||
operation on all the iterable's elements and, therefore, can't be applied to
|
||||
infinite iterables. ``func`` must be a function that takes two elements and
|
||||
returns a single value. :func:`functools.reduce` takes the first two elements A
|
||||
and B returned by the iterator and calculates ``func(A, B)``. It then requests
|
||||
the third element, C, calculates ``func(func(A, B), C)``, combines this result
|
||||
with the fourth element returned, and continues until the iterable is exhausted.
|
||||
If the iterable returns no values at all, a :exc:`TypeError` exception is
|
||||
raised. If the initial value is supplied, it's used as a starting point and
|
||||
``func(initial_value, A)`` is the first calculation. ::
|
||||
|
||||
``reduce(func, iter, [initial_value])`` doesn't have a counterpart in the
|
||||
:mod:`itertools` module because it cumulatively performs an operation on all the
|
||||
iterable's elements and therefore can't be applied to infinite iterables.
|
||||
``func`` must be a function that takes two elements and returns a single value.
|
||||
:func:`reduce` takes the first two elements A and B returned by the iterator and
|
||||
calculates ``func(A, B)``. It then requests the third element, C, calculates
|
||||
``func(func(A, B), C)``, combines this result with the fourth element returned,
|
||||
and continues until the iterable is exhausted. If the iterable returns no
|
||||
values at all, a :exc:`TypeError` exception is raised. If the initial value is
|
||||
supplied, it's used as a starting point and ``func(initial_value, A)`` is the
|
||||
first calculation.
|
||||
|
||||
::
|
||||
|
||||
import operator
|
||||
reduce(operator.concat, ['A', 'BB', 'C']) =>
|
||||
'ABBC'
|
||||
reduce(operator.concat, []) =>
|
||||
TypeError: reduce() of empty sequence with no initial value
|
||||
reduce(operator.mul, [1,2,3], 1) =>
|
||||
6
|
||||
reduce(operator.mul, [], 1) =>
|
||||
1
|
||||
|
||||
If you use :func:`operator.add` with :func:`reduce`, you'll add up all the
|
||||
elements of the iterable. This case is so common that there's a special
|
||||
import operator
|
||||
import functools
|
||||
functools.reduce(operator.concat, ['A', 'BB', 'C']) =>
|
||||
'ABBC'
|
||||
functools.reduce(operator.concat, []) =>
|
||||
TypeError: reduce() of empty sequence with no initial value
|
||||
functools.reduce(operator.mul, [1,2,3], 1) =>
|
||||
6
|
||||
functools.reduce(operator.mul, [], 1) =>
|
||||
1
|
||||
|
||||
If you use :func:`operator.add` with :func:`functools.reduce`, you'll add up all
|
||||
the elements of the iterable. This case is so common that there's a special
|
||||
built-in called :func:`sum` to compute it::
|
||||
|
||||
reduce(operator.add, [1,2,3,4], 0) =>
|
||||
10
|
||||
sum([1,2,3,4]) =>
|
||||
10
|
||||
sum([]) =>
|
||||
0
|
||||
functools.reduce(operator.add, [1,2,3,4], 0) =>
|
||||
10
|
||||
sum([1,2,3,4]) =>
|
||||
10
|
||||
sum([]) =>
|
||||
0
|
||||
|
||||
For many uses of :func:`reduce`, though, it can be clearer to just write the
|
||||
obvious :keyword:`for` loop::
|
||||
|
||||
# Instead of:
|
||||
product = reduce(operator.mul, [1,2,3], 1)
|
||||
# Instead of:
|
||||
product = functools.reduce(operator.mul, [1,2,3], 1)
|
||||
|
||||
# You can write:
|
||||
product = 1
|
||||
for i in [1,2,3]:
|
||||
product *= i
|
||||
# You can write:
|
||||
product = 1
|
||||
for i in [1,2,3]:
|
||||
product *= i
|
||||
|
||||
|
||||
``enumerate(iter)`` counts off the elements in the iterable, returning 2-tuples
|
||||
containing the count and each element.
|
||||
|
||||
::
|
||||
containing the count and each element. ::
|
||||
|
||||
enumerate(['subject', 'verb', 'object']) =>
|
||||
(0, 'subject'), (1, 'verb'), (2, 'object')
|
||||
|
@ -723,12 +712,10 @@ indexes at which certain conditions are met::
|
|||
if line.strip() == '':
|
||||
print('Blank line at line #%i' % i)
|
||||
|
||||
``sorted(iterable, [cmp=None], [key=None], [reverse=False)`` collects all the
|
||||
elements of the iterable into a list, sorts the list, and returns the sorted
|
||||
result. The ``cmp``, ``key``, and ``reverse`` arguments are passed through to
|
||||
the constructed list's ``.sort()`` method.
|
||||
|
||||
::
|
||||
``sorted(iterable, [key=None], [reverse=False)`` collects all the elements of
|
||||
the iterable into a list, sorts the list, and returns the sorted result. The
|
||||
``key``, and ``reverse`` arguments are passed through to the constructed list's
|
||||
``sort()`` method. ::
|
||||
|
||||
import random
|
||||
# Generate 8 random numbers between [0, 10000)
|
||||
|
@ -962,14 +949,7 @@ consumed more than the others.
|
|||
Calling functions on elements
|
||||
-----------------------------
|
||||
|
||||
Two functions are used for calling other functions on the contents of an
|
||||
iterable.
|
||||
|
||||
``itertools.imap(f, iterA, iterB, ...)`` returns a stream containing
|
||||
``f(iterA[0], iterB[0]), f(iterA[1], iterB[1]), f(iterA[2], iterB[2]), ...``::
|
||||
|
||||
itertools.imap(operator.add, [5, 6, 5], [1, 2, 3]) =>
|
||||
6, 8, 8
|
||||
``itertools.imap(func, iter)`` is the same as built-in :func:`map`.
|
||||
|
||||
The ``operator`` module contains a set of functions corresponding to Python's
|
||||
operators. Some examples are ``operator.add(a, b)`` (adds two values),
|
||||
|
@ -992,14 +972,7 @@ Selecting elements
|
|||
Another group of functions chooses a subset of an iterator's elements based on a
|
||||
predicate.
|
||||
|
||||
``itertools.ifilter(predicate, iter)`` returns all the elements for which the
|
||||
predicate returns true::
|
||||
|
||||
def is_even(x):
|
||||
return (x % 2) == 0
|
||||
|
||||
itertools.ifilter(is_even, itertools.count()) =>
|
||||
0, 2, 4, 6, 8, 10, 12, 14, ...
|
||||
``itertools.ifilter(predicate, iter)`` is the same as built-in :func:`filter`.
|
||||
|
||||
``itertools.ifilterfalse(predicate, iter)`` is the opposite, returning all
|
||||
elements for which the predicate returns false::
|
||||
|
@ -1117,8 +1090,7 @@ that perform a single operation.
|
|||
|
||||
Some of the functions in this module are:
|
||||
|
||||
* Math operations: ``add()``, ``sub()``, ``mul()``, ``div()``, ``floordiv()``,
|
||||
``abs()``, ...
|
||||
* Math operations: ``add()``, ``sub()``, ``mul()``, ``floordiv()``, ``abs()``, ...
|
||||
* Logical operations: ``not_()``, ``truth()``.
|
||||
* Bitwise operations: ``and_()``, ``or_()``, ``invert()``.
|
||||
* Comparisons: ``eq()``, ``ne()``, ``lt()``, ``le()``, ``gt()``, and ``ge()``.
|
||||
|
@ -1190,7 +1162,7 @@ is equivalent to::
|
|||
f(*g(5, 6))
|
||||
|
||||
Even though ``compose()`` only accepts two functions, it's trivial to build up a
|
||||
version that will compose any number of functions. We'll use ``reduce()``,
|
||||
version that will compose any number of functions. We'll use ``functools.reduce()``,
|
||||
``compose()`` and ``partial()`` (the last of which is provided by both
|
||||
``functional`` and ``functools``).
|
||||
|
||||
|
@ -1198,7 +1170,7 @@ version that will compose any number of functions. We'll use ``reduce()``,
|
|||
|
||||
from functional import compose, partial
|
||||
|
||||
multi_compose = partial(reduce, compose)
|
||||
multi_compose = partial(functools.reduce, compose)
|
||||
|
||||
|
||||
We can also use ``map()``, ``compose()`` and ``partial()`` to craft a version of
|
||||
|
|
|
@ -497,7 +497,7 @@ more convenient. If a program contains a lot of regular expressions, or re-uses
|
|||
the same ones in several locations, then it might be worthwhile to collect all
|
||||
the definitions in one place, in a section of code that compiles all the REs
|
||||
ahead of time. To take an example from the standard library, here's an extract
|
||||
from :file:`xmllib.py`::
|
||||
from the now deprecated :file:`xmllib.py`::
|
||||
|
||||
ref = re.compile( ... )
|
||||
entityref = re.compile( ... )
|
||||
|
|
|
@ -237,129 +237,83 @@ Python's Unicode Support
|
|||
Now that you've learned the rudiments of Unicode, we can look at Python's
|
||||
Unicode features.
|
||||
|
||||
The String Type
|
||||
---------------
|
||||
|
||||
The Unicode Type
|
||||
----------------
|
||||
Since Python 3.0, the language features a ``str`` type that contain Unicode
|
||||
characters, meaning any string created using ``"unicode rocks!"``, ``'unicode
|
||||
rocks!``, or the triple-quoted string syntax is stored as Unicode.
|
||||
|
||||
Unicode strings are expressed as instances of the :class:`unicode` type, one of
|
||||
Python's repertoire of built-in types. It derives from an abstract type called
|
||||
:class:`basestring`, which is also an ancestor of the :class:`str` type; you can
|
||||
therefore check if a value is a string type with ``isinstance(value,
|
||||
basestring)``. Under the hood, Python represents Unicode strings as either 16-
|
||||
or 32-bit integers, depending on how the Python interpreter was compiled.
|
||||
To insert a Unicode character that is not part ASCII, e.g., any letters with
|
||||
accents, one can use escape sequences in their string literals as such::
|
||||
|
||||
The :func:`unicode` constructor has the signature ``unicode(string[, encoding,
|
||||
errors])``. All of its arguments should be 8-bit strings. The first argument
|
||||
is converted to Unicode using the specified encoding; if you leave off the
|
||||
``encoding`` argument, the ASCII encoding is used for the conversion, so
|
||||
characters greater than 127 will be treated as errors::
|
||||
>>> "\N{GREEK CAPITAL LETTER DELTA}" # Using the character name
|
||||
'\u0394'
|
||||
>>> "\u0394" # Using a 16-bit hex value
|
||||
'\u0394'
|
||||
>>> "\U00000394" # Using a 32-bit hex value
|
||||
'\u0394'
|
||||
|
||||
>>> unicode('abcdef')
|
||||
u'abcdef'
|
||||
>>> s = unicode('abcdef')
|
||||
>>> type(s)
|
||||
<type 'unicode'>
|
||||
>>> unicode('abcdef' + chr(255))
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in ?
|
||||
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 6:
|
||||
ordinal not in range(128)
|
||||
In addition, one can create a string using the :func:`decode` method of
|
||||
:class:`bytes`. This method takes an encoding, such as UTF-8, and, optionally,
|
||||
an *errors* argument.
|
||||
|
||||
The ``errors`` argument specifies the response when the input string can't be
|
||||
The *errors* argument specifies the response when the input string can't be
|
||||
converted according to the encoding's rules. Legal values for this argument are
|
||||
'strict' (raise a ``UnicodeDecodeError`` exception), 'replace' (add U+FFFD,
|
||||
'strict' (raise a :exc:`UnicodeDecodeError` exception), 'replace' (add U+FFFD,
|
||||
'REPLACEMENT CHARACTER'), or 'ignore' (just leave the character out of the
|
||||
Unicode result). The following examples show the differences::
|
||||
|
||||
>>> unicode('\x80abc', errors='strict')
|
||||
>>> b'\x80abc'.decode("utf-8", "strict")
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in ?
|
||||
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0:
|
||||
ordinal not in range(128)
|
||||
>>> unicode('\x80abc', errors='replace')
|
||||
u'\ufffdabc'
|
||||
>>> unicode('\x80abc', errors='ignore')
|
||||
u'abc'
|
||||
>>> b'\x80abc'.decode("utf-8", "replace")
|
||||
'\ufffdabc'
|
||||
>>> b'\x80abc'.decode("utf-8", "ignore")
|
||||
'abc'
|
||||
|
||||
Encodings are specified as strings containing the encoding's name. Python 2.4
|
||||
Encodings are specified as strings containing the encoding's name. Python
|
||||
comes with roughly 100 different encodings; see the Python Library Reference at
|
||||
<http://docs.python.org/lib/standard-encodings.html> for a list. Some encodings
|
||||
have multiple names; for example, 'latin-1', 'iso_8859_1' and '8859' are all
|
||||
synonyms for the same encoding.
|
||||
|
||||
One-character Unicode strings can also be created with the :func:`unichr`
|
||||
One-character Unicode strings can also be created with the :func:`chr`
|
||||
built-in function, which takes integers and returns a Unicode string of length 1
|
||||
that contains the corresponding code point. The reverse operation is the
|
||||
built-in :func:`ord` function that takes a one-character Unicode string and
|
||||
returns the code point value::
|
||||
|
||||
>>> unichr(40960)
|
||||
u'\ua000'
|
||||
>>> ord(u'\ua000')
|
||||
>>> chr(40960)
|
||||
'\ua000'
|
||||
>>> ord('\ua000')
|
||||
40960
|
||||
|
||||
Instances of the :class:`unicode` type have many of the same methods as the
|
||||
8-bit string type for operations such as searching and formatting::
|
||||
Converting to Bytes
|
||||
-------------------
|
||||
|
||||
>>> s = u'Was ever feather so lightly blown to and fro as this multitude?'
|
||||
>>> s.count('e')
|
||||
5
|
||||
>>> s.find('feather')
|
||||
9
|
||||
>>> s.find('bird')
|
||||
-1
|
||||
>>> s.replace('feather', 'sand')
|
||||
u'Was ever sand so lightly blown to and fro as this multitude?'
|
||||
>>> s.upper()
|
||||
u'WAS EVER FEATHER SO LIGHTLY BLOWN TO AND FRO AS THIS MULTITUDE?'
|
||||
|
||||
Note that the arguments to these methods can be Unicode strings or 8-bit
|
||||
strings. 8-bit strings will be converted to Unicode before carrying out the
|
||||
operation; Python's default ASCII encoding will be used, so characters greater
|
||||
than 127 will cause an exception::
|
||||
|
||||
>>> s.find('Was\x9f')
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in ?
|
||||
UnicodeDecodeError: 'ascii' codec can't decode byte 0x9f in position 3: ordinal not in range(128)
|
||||
>>> s.find(u'Was\x9f')
|
||||
-1
|
||||
|
||||
Much Python code that operates on strings will therefore work with Unicode
|
||||
strings without requiring any changes to the code. (Input and output code needs
|
||||
more updating for Unicode; more on this later.)
|
||||
|
||||
Another important method is ``.encode([encoding], [errors='strict'])``, which
|
||||
returns an 8-bit string version of the Unicode string, encoded in the requested
|
||||
encoding. The ``errors`` parameter is the same as the parameter of the
|
||||
``unicode()`` constructor, with one additional possibility; as well as 'strict',
|
||||
Another important str method is ``.encode([encoding], [errors='strict'])``,
|
||||
which returns a ``bytes`` representation of the Unicode string, encoded in the
|
||||
requested encoding. The ``errors`` parameter is the same as the parameter of
|
||||
the :meth:`decode` method, with one additional possibility; as well as 'strict',
|
||||
'ignore', and 'replace', you can also pass 'xmlcharrefreplace' which uses XML's
|
||||
character references. The following example shows the different results::
|
||||
|
||||
>>> u = unichr(40960) + u'abcd' + unichr(1972)
|
||||
>>> u = chr(40960) + 'abcd' + chr(1972)
|
||||
>>> u.encode('utf-8')
|
||||
'\xea\x80\x80abcd\xde\xb4'
|
||||
b'\xea\x80\x80abcd\xde\xb4'
|
||||
>>> u.encode('ascii')
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in ?
|
||||
UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in position 0: ordinal not in range(128)
|
||||
>>> u.encode('ascii', 'ignore')
|
||||
'abcd'
|
||||
b'abcd'
|
||||
>>> u.encode('ascii', 'replace')
|
||||
'?abcd?'
|
||||
b'?abcd?'
|
||||
>>> u.encode('ascii', 'xmlcharrefreplace')
|
||||
'ꀀabcd޴'
|
||||
|
||||
Python's 8-bit strings have a ``.decode([encoding], [errors])`` method that
|
||||
interprets the string using the given encoding::
|
||||
|
||||
>>> u = unichr(40960) + u'abcd' + unichr(1972) # Assemble a string
|
||||
>>> utf8_version = u.encode('utf-8') # Encode as UTF-8
|
||||
>>> type(utf8_version), utf8_version
|
||||
(<type 'str'>, '\xea\x80\x80abcd\xde\xb4')
|
||||
>>> u2 = utf8_version.decode('utf-8') # Decode using UTF-8
|
||||
>>> u == u2 # The two strings match
|
||||
True
|
||||
b'ꀀabcd޴'
|
||||
|
||||
The low-level routines for registering and accessing the available encodings are
|
||||
found in the :mod:`codecs` module. However, the encoding and decoding functions
|
||||
|
@ -377,22 +331,14 @@ output.
|
|||
Unicode Literals in Python Source Code
|
||||
--------------------------------------
|
||||
|
||||
In Python source code, Unicode literals are written as strings prefixed with the
|
||||
'u' or 'U' character: ``u'abcdefghijk'``. Specific code points can be written
|
||||
using the ``\u`` escape sequence, which is followed by four hex digits giving
|
||||
the code point. The ``\U`` escape sequence is similar, but expects 8 hex
|
||||
digits, not 4.
|
||||
In Python source code, specific Unicode code points can be written using the
|
||||
``\u`` escape sequence, which is followed by four hex digits giving the code
|
||||
point. The ``\U`` escape sequence is similar, but expects 8 hex digits, not 4::
|
||||
|
||||
Unicode literals can also use the same escape sequences as 8-bit strings,
|
||||
including ``\x``, but ``\x`` only takes two hex digits so it can't express an
|
||||
arbitrary code point. Octal escapes can go up to U+01ff, which is octal 777.
|
||||
|
||||
::
|
||||
|
||||
>>> s = u"a\xac\u1234\u20ac\U00008000"
|
||||
^^^^ two-digit hex escape
|
||||
^^^^^^ four-digit Unicode escape
|
||||
^^^^^^^^^^ eight-digit Unicode escape
|
||||
>>> s = "a\xac\u1234\u20ac\U00008000"
|
||||
^^^^ two-digit hex escape
|
||||
^^^^^ four-digit Unicode escape
|
||||
^^^^^^^^^^ eight-digit Unicode escape
|
||||
>>> for c in s: print(ord(c), end=" ")
|
||||
...
|
||||
97 172 4660 8364 32768
|
||||
|
@ -400,7 +346,7 @@ arbitrary code point. Octal escapes can go up to U+01ff, which is octal 777.
|
|||
Using escape sequences for code points greater than 127 is fine in small doses,
|
||||
but becomes an annoyance if you're using many accented characters, as you would
|
||||
in a program with messages in French or some other accent-using language. You
|
||||
can also assemble strings using the :func:`unichr` built-in function, but this is
|
||||
can also assemble strings using the :func:`chr` built-in function, but this is
|
||||
even more tedious.
|
||||
|
||||
Ideally, you'd want to be able to write literals in your language's natural
|
||||
|
@ -408,14 +354,15 @@ encoding. You could then edit Python source code with your favorite editor
|
|||
which would display the accented characters naturally, and have the right
|
||||
characters used at runtime.
|
||||
|
||||
Python supports writing Unicode literals in any encoding, but you have to
|
||||
declare the encoding being used. This is done by including a special comment as
|
||||
either the first or second line of the source file::
|
||||
Python supports writing Unicode literals in UTF-8 by default, but you can use
|
||||
(almost) any encoding if you declare the encoding being used. This is done by
|
||||
including a special comment as either the first or second line of the source
|
||||
file::
|
||||
|
||||
#!/usr/bin/env python
|
||||
# -*- coding: latin-1 -*-
|
||||
|
||||
u = u'abcdé'
|
||||
u = 'abcdé'
|
||||
print(ord(u[-1]))
|
||||
|
||||
The syntax is inspired by Emacs's notation for specifying variables local to a
|
||||
|
@ -424,22 +371,8 @@ file. Emacs supports many different variables, but Python only supports
|
|||
them, you must supply the name ``coding`` and the name of your chosen encoding,
|
||||
separated by ``':'``.
|
||||
|
||||
If you don't include such a comment, the default encoding used will be ASCII.
|
||||
Versions of Python before 2.4 were Euro-centric and assumed Latin-1 as a default
|
||||
encoding for string literals; in Python 2.4, characters greater than 127 still
|
||||
work but result in a warning. For example, the following program has no
|
||||
encoding declaration::
|
||||
|
||||
#!/usr/bin/env python
|
||||
u = u'abcdé'
|
||||
print(ord(u[-1]))
|
||||
|
||||
When you run it with Python 2.4, it will output the following warning::
|
||||
|
||||
amk:~$ python p263.py
|
||||
sys:1: DeprecationWarning: Non-ASCII character '\xe9'
|
||||
in file p263.py on line 2, but no encoding declared;
|
||||
see http://www.python.org/peps/pep-0263.html for details
|
||||
If you don't include such a comment, the default encoding used will be UTF-8 as
|
||||
already mentioned.
|
||||
|
||||
|
||||
Unicode Properties
|
||||
|
@ -457,7 +390,7 @@ prints the numeric value of one particular character::
|
|||
|
||||
import unicodedata
|
||||
|
||||
u = unichr(233) + unichr(0x0bf2) + unichr(3972) + unichr(6000) + unichr(13231)
|
||||
u = chr(233) + chr(0x0bf2) + chr(3972) + chr(6000) + chr(13231)
|
||||
|
||||
for i, c in enumerate(u):
|
||||
print(i, '%04x' % ord(c), unicodedata.category(c), end=" ")
|
||||
|
@ -487,8 +420,8 @@ list of category codes.
|
|||
References
|
||||
----------
|
||||
|
||||
The Unicode and 8-bit string types are described in the Python library reference
|
||||
at :ref:`typesseq`.
|
||||
The ``str`` type is described in the Python library reference at
|
||||
:ref:`typesseq`.
|
||||
|
||||
The documentation for the :mod:`unicodedata` module.
|
||||
|
||||
|
@ -557,7 +490,7 @@ It's also possible to open files in update mode, allowing both reading and
|
|||
writing::
|
||||
|
||||
f = codecs.open('test', encoding='utf-8', mode='w+')
|
||||
f.write(u'\u4500 blah blah blah\n')
|
||||
f.write('\u4500 blah blah blah\n')
|
||||
f.seek(0)
|
||||
print(repr(f.readline()[:1]))
|
||||
f.close()
|
||||
|
@ -590,7 +523,7 @@ not much reason to bother. When opening a file for reading or writing, you can
|
|||
usually just provide the Unicode string as the filename, and it will be
|
||||
automatically converted to the right encoding for you::
|
||||
|
||||
filename = u'filename\u4500abc'
|
||||
filename = 'filename\u4500abc'
|
||||
f = open(filename, 'w')
|
||||
f.write('blah\n')
|
||||
f.close()
|
||||
|
@ -607,7 +540,7 @@ encoding and a list of Unicode strings will be returned, while passing an 8-bit
|
|||
path will return the 8-bit versions of the filenames. For example, assuming the
|
||||
default filesystem encoding is UTF-8, running the following program::
|
||||
|
||||
fn = u'filename\u4500abc'
|
||||
fn = 'filename\u4500abc'
|
||||
f = open(fn, 'w')
|
||||
f.close()
|
||||
|
||||
|
@ -619,7 +552,7 @@ will produce the following output::
|
|||
|
||||
amk:~$ python t.py
|
||||
['.svn', 'filename\xe4\x94\x80abc', ...]
|
||||
[u'.svn', u'filename\u4500abc', ...]
|
||||
['.svn', 'filename\u4500abc', ...]
|
||||
|
||||
The first list contains UTF-8-encoded filenames, and the second list contains
|
||||
the Unicode versions.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue