mirror of
https://github.com/python/cpython.git
synced 2025-07-25 04:04:13 +00:00
Add bytes/remove unicode from the data model.
This commit is contained in:
parent
85eb8c103c
commit
dcc56f8bf6
1 changed files with 36 additions and 64 deletions
|
@ -289,52 +289,21 @@ Sequences
|
||||||
.. index::
|
.. index::
|
||||||
builtin: chr
|
builtin: chr
|
||||||
builtin: ord
|
builtin: ord
|
||||||
object: string
|
builtin: str
|
||||||
single: character
|
|
||||||
single: byte
|
|
||||||
single: ASCII@ASCII
|
|
||||||
|
|
||||||
The items of a string are characters. There is no separate character type; a
|
|
||||||
character is represented by a string of one item. Characters represent (at
|
|
||||||
least) 8-bit bytes. The built-in functions :func:`chr` and :func:`ord` convert
|
|
||||||
between characters and nonnegative integers representing the byte values. Bytes
|
|
||||||
with the values 0-127 usually represent the corresponding ASCII values, but the
|
|
||||||
interpretation of values is up to the program. The string data type is also
|
|
||||||
used to represent arrays of bytes, e.g., to hold data read from a file.
|
|
||||||
|
|
||||||
.. index::
|
|
||||||
single: ASCII@ASCII
|
|
||||||
single: EBCDIC
|
|
||||||
single: character set
|
|
||||||
pair: string; comparison
|
|
||||||
builtin: chr
|
|
||||||
builtin: ord
|
|
||||||
|
|
||||||
(On systems whose native character set is not ASCII, strings may use EBCDIC in
|
|
||||||
their internal representation, provided the functions :func:`chr` and
|
|
||||||
:func:`ord` implement a mapping between ASCII and EBCDIC, and string comparison
|
|
||||||
preserves the ASCII order. Or perhaps someone can propose a better rule?)
|
|
||||||
|
|
||||||
Unicode
|
|
||||||
.. index::
|
|
||||||
builtin: unichr
|
|
||||||
builtin: ord
|
|
||||||
builtin: unicode
|
|
||||||
object: unicode
|
|
||||||
single: character
|
single: character
|
||||||
single: integer
|
single: integer
|
||||||
single: Unicode
|
single: Unicode
|
||||||
|
|
||||||
The items of a Unicode object are Unicode code units. A Unicode code unit is
|
The items of a string object are Unicode code units. A Unicode code
|
||||||
represented by a Unicode object of one item and can hold either a 16-bit or
|
unit is represented by a string object of one item and can hold either
|
||||||
32-bit value representing a Unicode ordinal (the maximum value for the ordinal
|
a 16-bit or 32-bit value representing a Unicode ordinal (the maximum
|
||||||
is given in ``sys.maxunicode``, and depends on how Python is configured at
|
value for the ordinal is given in ``sys.maxunicode``, and depends on
|
||||||
compile time). Surrogate pairs may be present in the Unicode object, and will
|
how Python is configured at compile time). Surrogate pairs may be
|
||||||
be reported as two separate items. The built-in functions :func:`unichr` and
|
present in the Unicode object, and will be reported as two separate
|
||||||
:func:`ord` convert between code units and nonnegative integers representing the
|
items. The built-in functions :func:`chr` and :func:`ord` convert
|
||||||
Unicode ordinals as defined in the Unicode Standard 3.0. Conversion from and to
|
between code units and nonnegative integers representing the Unicode
|
||||||
other encodings are possible through the Unicode method :meth:`encode` and the
|
ordinals as defined in the Unicode Standard 3.0. Conversion from and to
|
||||||
built-in function :func:`unicode`.
|
other encodings are possible through the string method :meth:`encode`.
|
||||||
|
|
||||||
Tuples
|
Tuples
|
||||||
.. index::
|
.. index::
|
||||||
|
@ -342,11 +311,12 @@ Sequences
|
||||||
pair: singleton; tuple
|
pair: singleton; tuple
|
||||||
pair: empty; tuple
|
pair: empty; tuple
|
||||||
|
|
||||||
The items of a tuple are arbitrary Python objects. Tuples of two or more items
|
The items of a tuple are arbitrary Python objects. Tuples of two or
|
||||||
are formed by comma-separated lists of expressions. A tuple of one item (a
|
more items are formed by comma-separated lists of expressions. A tuple
|
||||||
'singleton') can be formed by affixing a comma to an expression (an expression
|
of one item (a 'singleton') can be formed by affixing a comma to an
|
||||||
by itself does not create a tuple, since parentheses must be usable for grouping
|
expression (an expression by itself does not create a tuple, since
|
||||||
of expressions). An empty tuple can be formed by an empty pair of parentheses.
|
parentheses must be usable for grouping of expressions). An empty
|
||||||
|
tuple can be formed by an empty pair of parentheses.
|
||||||
|
|
||||||
.. % Immutable sequences
|
.. % Immutable sequences
|
||||||
|
|
||||||
|
@ -369,14 +339,23 @@ Sequences
|
||||||
Lists
|
Lists
|
||||||
.. index:: object: list
|
.. index:: object: list
|
||||||
|
|
||||||
The items of a list are arbitrary Python objects. Lists are formed by placing a
|
The items of a list are arbitrary Python objects. Lists are formed by
|
||||||
comma-separated list of expressions in square brackets. (Note that there are no
|
placing a comma-separated list of expressions in square brackets. (Note
|
||||||
special cases needed to form lists of length 0 or 1.)
|
that there are no special cases needed to form lists of length 0 or 1.)
|
||||||
|
|
||||||
|
Bytes
|
||||||
|
.. index:: bytes, byte
|
||||||
|
|
||||||
|
A bytes object is a mutable array. The items are 8-bit bytes,
|
||||||
|
represented by integers in the range 0 <= x < 256. Bytes literals
|
||||||
|
(like ``b'abc'`` and the built-in function :func:`bytes` can be used to
|
||||||
|
construct bytes objects. Also, bytes objects can be decoded to strings
|
||||||
|
via the :meth:`decode` method.
|
||||||
|
|
||||||
.. index:: module: array
|
.. index:: module: array
|
||||||
|
|
||||||
The extension module :mod:`array` provides an additional example of a mutable
|
The extension module :mod:`array` provides an additional example of a
|
||||||
sequence type.
|
mutable sequence type.
|
||||||
|
|
||||||
.. % Mutable sequences
|
.. % Mutable sequences
|
||||||
|
|
||||||
|
@ -1230,12 +1209,14 @@ Basic customization
|
||||||
builtin: str
|
builtin: str
|
||||||
builtin: print
|
builtin: print
|
||||||
|
|
||||||
Called by the :func:`str` built-in function and by the :func:`print`
|
Called by the :func:`str` built-in function and by the :func:`print` function
|
||||||
function to compute the "informal" string representation of an object. This
|
to compute the "informal" string representation of an object. This differs
|
||||||
differs from :meth:`__repr__` in that it does not have to be a valid Python
|
from :meth:`__repr__` in that it does not have to be a valid Python
|
||||||
expression: a more convenient or concise representation may be used instead.
|
expression: a more convenient or concise representation may be used instead.
|
||||||
The return value must be a string object.
|
The return value must be a string object.
|
||||||
|
|
||||||
|
.. XXX what about subclasses of string?
|
||||||
|
|
||||||
|
|
||||||
.. method:: object.__format__(self, format_spec)
|
.. method:: object.__format__(self, format_spec)
|
||||||
|
|
||||||
|
@ -1355,15 +1336,6 @@ Basic customization
|
||||||
:meth:`__bool__`, all its instances are considered true.
|
:meth:`__bool__`, all its instances are considered true.
|
||||||
|
|
||||||
|
|
||||||
.. method:: object.__unicode__(self)
|
|
||||||
|
|
||||||
.. index:: builtin: unicode
|
|
||||||
|
|
||||||
Called to implement :func:`unicode` builtin; should return a Unicode object.
|
|
||||||
When this method is not defined, string conversion is attempted, and the result
|
|
||||||
of string conversion is converted to Unicode using the system default encoding.
|
|
||||||
|
|
||||||
|
|
||||||
.. _attribute-access:
|
.. _attribute-access:
|
||||||
|
|
||||||
Customizing attribute access
|
Customizing attribute access
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue