mirror of
https://github.com/python/cpython.git
synced 2025-12-23 09:19:18 +00:00
Update the first two parts of the reference manual for Py3k,
mainly concerning PEPs 3131 and 3120.
This commit is contained in:
parent
3dc33d1845
commit
57e3b68c22
3 changed files with 173 additions and 232 deletions
|
|
@ -27,6 +27,7 @@ are more than welcome as well.
|
|||
style.rst
|
||||
rest.rst
|
||||
markup.rst
|
||||
fromlatex.rst
|
||||
sphinx.rst
|
||||
|
||||
.. XXX add credits, thanks etc.
|
||||
|
|
|
|||
|
|
@ -22,11 +22,12 @@ language, maybe you could volunteer your time --- or invent a cloning machine
|
|||
|
||||
It is dangerous to add too many implementation details to a language reference
|
||||
document --- the implementation may change, and other implementations of the
|
||||
same language may work differently. On the other hand, there is currently only
|
||||
one Python implementation in widespread use (although alternate implementations
|
||||
exist), and its particular quirks are sometimes worth being mentioned,
|
||||
especially where the implementation imposes additional limitations. Therefore,
|
||||
you'll find short "implementation notes" sprinkled throughout the text.
|
||||
same language may work differently. On the other hand, CPython is the one
|
||||
Python implementation in widespread use (although alternate implementations
|
||||
continue to gain support), and its particular quirks are sometimes worth being
|
||||
mentioned, especially where the implementation imposes additional limitations.
|
||||
Therefore, you'll find short "implementation notes" sprinkled throughout the
|
||||
text.
|
||||
|
||||
Every Python implementation comes with a number of built-in and standard
|
||||
modules. These are documented in :ref:`library-index`. A few built-in modules
|
||||
|
|
@ -88,11 +89,7 @@ implementation you're using.
|
|||
Notation
|
||||
========
|
||||
|
||||
.. index::
|
||||
single: BNF
|
||||
single: grammar
|
||||
single: syntax
|
||||
single: notation
|
||||
.. index:: BNF, grammar, syntax, notation
|
||||
|
||||
The descriptions of lexical analysis and syntax use a modified BNF grammar
|
||||
notation. This uses the following style of definition:
|
||||
|
|
@ -118,9 +115,7 @@ meaningful to separate tokens. Rules are normally contained on a single line;
|
|||
rules with many alternatives may be formatted alternatively with each line after
|
||||
the first beginning with a vertical bar.
|
||||
|
||||
.. index::
|
||||
single: lexical definitions
|
||||
single: ASCII@ASCII
|
||||
.. index:: lexical definitions, ASCII
|
||||
|
||||
In lexical definitions (as the example above), two more conventions are used:
|
||||
Two literal characters separated by three dots mean a choice of any single
|
||||
|
|
|
|||
|
|
@ -5,38 +5,16 @@
|
|||
Lexical analysis
|
||||
****************
|
||||
|
||||
.. index::
|
||||
single: lexical analysis
|
||||
single: parser
|
||||
single: token
|
||||
.. index:: lexical analysis, parser, token
|
||||
|
||||
A Python program is read by a *parser*. Input to the parser is a stream of
|
||||
*tokens*, generated by the *lexical analyzer*. This chapter describes how the
|
||||
lexical analyzer breaks a file into tokens.
|
||||
|
||||
Python uses the 7-bit ASCII character set for program text.
|
||||
|
||||
.. versionadded:: 2.3
|
||||
An encoding declaration can be used to indicate that string literals and
|
||||
comments use an encoding different from ASCII.
|
||||
|
||||
For compatibility with older versions, Python only warns if it finds 8-bit
|
||||
characters; those warnings should be corrected by either declaring an explicit
|
||||
encoding, or using escape sequences if those bytes are binary data, instead of
|
||||
characters.
|
||||
|
||||
The run-time character set depends on the I/O devices connected to the program
|
||||
but is generally a superset of ASCII.
|
||||
|
||||
**Future compatibility note:** It may be tempting to assume that the character
|
||||
set for 8-bit characters is ISO Latin-1 (an ASCII superset that covers most
|
||||
western languages that use the Latin alphabet), but it is possible that in the
|
||||
future Unicode text editors will become common. These generally use the UTF-8
|
||||
encoding, which is also an ASCII superset, but with very different use for the
|
||||
characters with ordinals 128-255. While there is no consensus on this subject
|
||||
yet, it is unwise to assume either Latin-1 or UTF-8, even though the current
|
||||
implementation appears to favor Latin-1. This applies both to the source
|
||||
character set and the run-time character set.
|
||||
Python reads program text as Unicode code points; the encoding of a source file
|
||||
can be given by an encoding declaration and defaults to UTF-8, see :pep:`3120`
|
||||
for details. If the source file cannot be decoded, a :exc:`SyntaxError` is
|
||||
raised.
|
||||
|
||||
|
||||
.. _line-structure:
|
||||
|
|
@ -44,21 +22,17 @@ character set and the run-time character set.
|
|||
Line structure
|
||||
==============
|
||||
|
||||
.. index:: single: line structure
|
||||
.. index:: line structure
|
||||
|
||||
A Python program is divided into a number of *logical lines*.
|
||||
|
||||
|
||||
.. _logical:
|
||||
.. _logical-lines:
|
||||
|
||||
Logical lines
|
||||
-------------
|
||||
|
||||
.. index::
|
||||
single: logical line
|
||||
single: physical line
|
||||
single: line joining
|
||||
single: NEWLINE token
|
||||
.. index:: logical line, physical line, line joining, NEWLINE token
|
||||
|
||||
The end of a logical line is represented by the token NEWLINE. Statements
|
||||
cannot cross logical line boundaries except where NEWLINE is allowed by the
|
||||
|
|
@ -67,7 +41,7 @@ constructed from one or more *physical lines* by following the explicit or
|
|||
implicit *line joining* rules.
|
||||
|
||||
|
||||
.. _physical:
|
||||
.. _physical-lines:
|
||||
|
||||
Physical lines
|
||||
--------------
|
||||
|
|
@ -89,9 +63,7 @@ representing ASCII LF, is the line terminator).
|
|||
Comments
|
||||
--------
|
||||
|
||||
.. index::
|
||||
single: comment
|
||||
single: hash character
|
||||
.. index:: comment, hash character
|
||||
|
||||
A comment starts with a hash character (``#``) that is not part of a string
|
||||
literal, and ends at the end of the physical line. A comment signifies the end
|
||||
|
|
@ -104,9 +76,7 @@ are ignored by the syntax; they are not tokens.
|
|||
Encoding declarations
|
||||
---------------------
|
||||
|
||||
.. index::
|
||||
single: source character set
|
||||
single: encodings
|
||||
.. index:: source character set, encodings
|
||||
|
||||
If a comment in the first or second line of the Python script matches the
|
||||
regular expression ``coding[=:]\s*([-\w.]+)``, this comment is processed as an
|
||||
|
|
@ -119,19 +89,19 @@ which is recognized also by GNU Emacs, and ::
|
|||
|
||||
# vim:fileencoding=<encoding-name>
|
||||
|
||||
which is recognized by Bram Moolenaar's VIM. In addition, if the first bytes of
|
||||
the file are the UTF-8 byte-order mark (``'\xef\xbb\xbf'``), the declared file
|
||||
encoding is UTF-8 (this is supported, among others, by Microsoft's
|
||||
:program:`notepad`).
|
||||
which is recognized by Bram Moolenaar's VIM.
|
||||
|
||||
If no encoding declaration is found, the default encoding is UTF-8. In
|
||||
addition, if the first bytes of the file are the UTF-8 byte-order mark
|
||||
(``b'\xef\xbb\xbf'``), the declared file encoding is UTF-8 (this is supported,
|
||||
among others, by Microsoft's :program:`notepad`).
|
||||
|
||||
If an encoding is declared, the encoding name must be recognized by Python. The
|
||||
encoding is used for all lexical analysis, in particular to find the end of a
|
||||
string, and to interpret the contents of Unicode literals. String literals are
|
||||
converted to Unicode for syntactical analysis, then converted back to their
|
||||
original encoding before interpretation starts. The encoding declaration must
|
||||
appear on a line of its own.
|
||||
encoding is used for all lexical analysis, including string literals, comments
|
||||
and identifiers. The encoding declaration must appear on a line of its own.
|
||||
|
||||
.. % XXX there should be a list of supported encodings.
|
||||
A list of standard encodings can be found in the section
|
||||
:ref:`standard-encodings`.
|
||||
|
||||
|
||||
.. _explicit-joining:
|
||||
|
|
@ -139,21 +109,13 @@ appear on a line of its own.
|
|||
Explicit line joining
|
||||
---------------------
|
||||
|
||||
.. index::
|
||||
single: physical line
|
||||
single: line joining
|
||||
single: line continuation
|
||||
single: backslash character
|
||||
.. index:: physical line, line joining, line continuation, backslash character
|
||||
|
||||
Two or more physical lines may be joined into logical lines using backslash
|
||||
characters (``\``), as follows: when a physical line ends in a backslash that is
|
||||
not part of a string literal or comment, it is joined with the following forming
|
||||
a single logical line, deleting the backslash and the following end-of-line
|
||||
character. For example:
|
||||
|
||||
.. %
|
||||
|
||||
::
|
||||
character. For example::
|
||||
|
||||
if 1900 < year < 2100 and 1 <= month <= 12 \
|
||||
and 1 <= day <= 31 and 0 <= hour < 24 \
|
||||
|
|
@ -197,9 +159,9 @@ Blank lines
|
|||
A logical line that contains only spaces, tabs, formfeeds and possibly a
|
||||
comment, is ignored (i.e., no NEWLINE token is generated). During interactive
|
||||
input of statements, handling of a blank line may differ depending on the
|
||||
implementation of the read-eval-print loop. In the standard implementation, an
|
||||
entirely blank logical line (i.e. one containing not even whitespace or a
|
||||
comment) terminates a multi-line statement.
|
||||
implementation of the read-eval-print loop. In the standard interactive
|
||||
interpreter, an entirely blank logical line (i.e. one containing not even
|
||||
whitespace or a comment) terminates a multi-line statement.
|
||||
|
||||
|
||||
.. _indentation:
|
||||
|
|
@ -207,14 +169,7 @@ comment) terminates a multi-line statement.
|
|||
Indentation
|
||||
-----------
|
||||
|
||||
.. index::
|
||||
single: indentation
|
||||
single: whitespace
|
||||
single: leading whitespace
|
||||
single: space
|
||||
single: tab
|
||||
single: grouping
|
||||
single: statement grouping
|
||||
.. index:: indentation, leading whitespace, space, tab, grouping, statement grouping
|
||||
|
||||
Leading whitespace (spaces and tabs) at the beginning of a logical line is used
|
||||
to compute the indentation level of the line, which in turn is used to determine
|
||||
|
|
@ -238,9 +193,7 @@ for the indentation calculations above. Formfeed characters occurring elsewhere
|
|||
in the leading whitespace have an undefined effect (for instance, they may reset
|
||||
the space count to zero).
|
||||
|
||||
.. index::
|
||||
single: INDENT token
|
||||
single: DEDENT token
|
||||
.. index:: INDENT token, DEDENT token
|
||||
|
||||
The indentation levels of consecutive lines are used to generate INDENT and
|
||||
DEDENT tokens, using a stack, as follows.
|
||||
|
|
@ -315,22 +268,48 @@ possible string that forms a legal token, when read from left to right.
|
|||
Identifiers and keywords
|
||||
========================
|
||||
|
||||
.. index::
|
||||
single: identifier
|
||||
single: name
|
||||
.. index:: identifier, name
|
||||
|
||||
Identifiers (also referred to as *names*) are described by the following lexical
|
||||
definitions:
|
||||
|
||||
.. productionlist::
|
||||
identifier: (`letter`|"_") (`letter` | `digit` | "_")*
|
||||
letter: `lowercase` | `uppercase`
|
||||
lowercase: "a"..."z"
|
||||
uppercase: "A"..."Z"
|
||||
digit: "0"..."9"
|
||||
The syntax of identifiers in Python is based on the Unicode standard annex
|
||||
UAX-31, with elaboration and changes as defined below.
|
||||
|
||||
Within the ASCII range (U+0001..U+007F), the valid characters for identifiers
|
||||
are the same as in Python 2.5; Python 3.0 introduces additional
|
||||
characters from outside the ASCII range (see :pep:`3131`). For other
|
||||
characters, the classification uses the version of the Unicode Character
|
||||
Database as included in the :mod:`unicodedata` module.
|
||||
|
||||
Identifiers are unlimited in length. Case is significant.
|
||||
|
||||
.. productionlist::
|
||||
identifier: `id_start` `id_continue`*
|
||||
id_start: <all characters in general categories Lu, Ll, Lt, Lm, Lo, Nl,
|
||||
the underscore, and characters with the Other_ID_Start property>
|
||||
id_continue: <all characters in `id_start`, plus characters in the categories
|
||||
Mn, Mc, Nd, Pc and others with the Other_ID_Continue property>
|
||||
|
||||
The Unicode category codes mentioned above stand for:
|
||||
|
||||
* *Lu* - uppercase letters
|
||||
* *Ll* - lowercase letters
|
||||
* *Lt* - titlecase letters
|
||||
* *Lm* - modifier letters
|
||||
* *Lo* - other letters
|
||||
* *Nl* - letter numbers
|
||||
* *Mn* - nonspacing marks
|
||||
* *Mc* - spacing combining marks
|
||||
* *Nd* - decimal numbers
|
||||
* *Pc* - connector punctuations
|
||||
|
||||
All identifiers are converted into the normal form NFC while parsing; comparison
|
||||
of identifiers is based on NFC.
|
||||
|
||||
A non-normative HTML file listing all valid identifier characters for Unicode
|
||||
4.1 can be found at
|
||||
http://www.dcl.hpi.uni-potsdam.de/home/loewis/table-3131.html.
|
||||
|
||||
.. _keywords:
|
||||
|
||||
|
|
@ -345,25 +324,13 @@ The following identifiers are used as reserved words, or *keywords* of the
|
|||
language, and cannot be used as ordinary identifiers. They must be spelled
|
||||
exactly as written here::
|
||||
|
||||
and def for is raise
|
||||
as del from lambda return
|
||||
assert elif global not try
|
||||
break else if or while
|
||||
class except import pass with
|
||||
continue finally in print yield
|
||||
|
||||
.. versionchanged:: 2.4
|
||||
:const:`None` became a constant and is now recognized by the compiler as a name
|
||||
for the built-in object :const:`None`. Although it is not a keyword, you cannot
|
||||
assign a different object to it.
|
||||
|
||||
.. versionchanged:: 2.5
|
||||
Both :keyword:`as` and :keyword:`with` are only recognized when the
|
||||
``with_statement`` future feature has been enabled. It will always be enabled in
|
||||
Python 2.6. See section :ref:`with` for details. Note that using :keyword:`as`
|
||||
and :keyword:`with` as identifiers will always issue a warning, even when the
|
||||
``with_statement`` future directive is not in effect.
|
||||
|
||||
False class finally is return
|
||||
None continue for lambda try
|
||||
True def from nonlocal while
|
||||
and del global not with
|
||||
as elif if or yield
|
||||
assert else import pass
|
||||
break except in raise
|
||||
|
||||
.. _id-classes:
|
||||
|
||||
|
|
@ -405,71 +372,71 @@ characters:
|
|||
Literals
|
||||
========
|
||||
|
||||
.. index::
|
||||
single: literal
|
||||
single: constant
|
||||
.. index:: literal, constant
|
||||
|
||||
Literals are notations for constant values of some built-in types.
|
||||
|
||||
|
||||
.. _strings:
|
||||
|
||||
String literals
|
||||
---------------
|
||||
String and Bytes literals
|
||||
-------------------------
|
||||
|
||||
.. index:: single: string literal
|
||||
.. index:: string literal, bytes literal, ASCII
|
||||
|
||||
String literals are described by the following lexical definitions:
|
||||
|
||||
.. index:: single: ASCII@ASCII
|
||||
|
||||
.. productionlist::
|
||||
stringliteral: [`stringprefix`](`shortstring` | `longstring`)
|
||||
stringprefix: "r" | "u" | "ur" | "R" | "U" | "UR" | "Ur" | "uR"
|
||||
stringprefix: "r" | "R"
|
||||
shortstring: "'" `shortstringitem`* "'" | '"' `shortstringitem`* '"'
|
||||
longstring: ""'" `longstringitem`* ""'"
|
||||
: | '"""' `longstringitem`* '"""'
|
||||
shortstringitem: `shortstringchar` | `escapeseq`
|
||||
longstringitem: `longstringchar` | `escapeseq`
|
||||
longstring: "'''" `longstringitem`* "'''" | '"""' `longstringitem`* '"""'
|
||||
shortstringitem: `shortstringchar` | `stringescapeseq`
|
||||
longstringitem: `longstringchar` | `stringescapeseq`
|
||||
shortstringchar: <any source character except "\" or newline or the quote>
|
||||
longstringchar: <any source character except "\">
|
||||
escapeseq: "\" <any ASCII character>
|
||||
stringescapeseq: "\" <any source character>
|
||||
|
||||
.. productionlist::
|
||||
bytesliteral: `bytesprefix`(`shortbytes` | `longbytes`)
|
||||
bytesprefix: "b" | "B"
|
||||
shortbytes: "'" `shortbytesitem`* "'" | '"' `shortbytesitem`* '"'
|
||||
longbytes: "'''" `longbytesitem`* "'''" | '"""' `longbytesitem`* '"""'
|
||||
shortbytesitem: `shortbyteschar` | `bytesescapeseq`
|
||||
longbytesitem: `longbyteschar` | `bytesescapeseq`
|
||||
shortbyteschar: <any ASCII character except "\" or newline or the quote>
|
||||
longbyteschar: <any ASCII character except "\">
|
||||
bytesescapeseq: "\" <any ASCII character>
|
||||
|
||||
One syntactic restriction not indicated by these productions is that whitespace
|
||||
is not allowed between the :token:`stringprefix` and the rest of the string
|
||||
literal. The source character set is defined by the encoding declaration; it is
|
||||
ASCII if no encoding declaration is given in the source file; see section
|
||||
:ref:`encodings`.
|
||||
is not allowed between the :token:`stringprefix` or :token:`bytesprefix` and the
|
||||
rest of the literal. The source character set is defined by the encoding
|
||||
declaration; it is UTF-8 if no encoding declaration is given in the source file;
|
||||
see section :ref:`encodings`.
|
||||
|
||||
.. index::
|
||||
single: triple-quoted string
|
||||
single: Unicode Consortium
|
||||
single: string; Unicode
|
||||
single: raw string
|
||||
.. index:: triple-quoted string, Unicode Consortium, raw string
|
||||
|
||||
In plain English: String literals can be enclosed in matching single quotes
|
||||
In plain English: Both types of literals can be enclosed in matching single quotes
|
||||
(``'``) or double quotes (``"``). They can also be enclosed in matching groups
|
||||
of three single or double quotes (these are generally referred to as
|
||||
*triple-quoted strings*). The backslash (``\``) character is used to escape
|
||||
characters that otherwise have a special meaning, such as newline, backslash
|
||||
itself, or the quote character. String literals may optionally be prefixed with
|
||||
a letter ``'r'`` or ``'R'``; such strings are called :dfn:`raw strings` and use
|
||||
different rules for interpreting backslash escape sequences. A prefix of
|
||||
``'u'`` or ``'U'`` makes the string a Unicode string. Unicode strings use the
|
||||
Unicode character set as defined by the Unicode Consortium and ISO 10646. Some
|
||||
additional escape sequences, described below, are available in Unicode strings.
|
||||
The two prefix characters may be combined; in this case, ``'u'`` must appear
|
||||
before ``'r'``.
|
||||
itself, or the quote character.
|
||||
|
||||
String literals may optionally be prefixed with a letter ``'r'`` or ``'R'``;
|
||||
such strings are called :dfn:`raw strings` and use different rules for
|
||||
interpreting backslash escape sequences.
|
||||
|
||||
Bytes literals are always prefixed with ``'b'`` or ``'B'``; they produce an
|
||||
instance of the :class:`bytes` type instead of the :class:`str` type. They
|
||||
may only contain ASCII characters; bytes with a numeric value of 128 or greater
|
||||
must be expressed with escapes.
|
||||
|
||||
In triple-quoted strings, unescaped newlines and quotes are allowed (and are
|
||||
retained), except that three unescaped quotes in a row terminate the string. (A
|
||||
"quote" is the character used to open the string, i.e. either ``'`` or ``"``.)
|
||||
|
||||
.. index::
|
||||
single: physical line
|
||||
single: escape sequence
|
||||
single: Standard C
|
||||
single: C
|
||||
.. index:: physical line, escape sequence, Standard C, C
|
||||
|
||||
Unless an ``'r'`` or ``'R'`` prefix is present, escape sequences in strings are
|
||||
interpreted according to rules similar to those used by Standard C. The
|
||||
|
|
@ -478,7 +445,7 @@ recognized escape sequences are:
|
|||
+-----------------+---------------------------------+-------+
|
||||
| Escape Sequence | Meaning | Notes |
|
||||
+=================+=================================+=======+
|
||||
| ``\newline`` | Ignored | |
|
||||
| ``\newline`` | Backslash and newline ignored | |
|
||||
+-----------------+---------------------------------+-------+
|
||||
| ``\\`` | Backslash (``\``) | |
|
||||
+-----------------+---------------------------------+-------+
|
||||
|
|
@ -494,83 +461,83 @@ recognized escape sequences are:
|
|||
+-----------------+---------------------------------+-------+
|
||||
| ``\n`` | ASCII Linefeed (LF) | |
|
||||
+-----------------+---------------------------------+-------+
|
||||
| ``\N{name}`` | Character named *name* in the | |
|
||||
| | Unicode database (Unicode only) | |
|
||||
+-----------------+---------------------------------+-------+
|
||||
| ``\r`` | ASCII Carriage Return (CR) | |
|
||||
+-----------------+---------------------------------+-------+
|
||||
| ``\t`` | ASCII Horizontal Tab (TAB) | |
|
||||
+-----------------+---------------------------------+-------+
|
||||
| ``\uxxxx`` | Character with 16-bit hex value | \(1) |
|
||||
| | *xxxx* (Unicode only) | |
|
||||
+-----------------+---------------------------------+-------+
|
||||
| ``\Uxxxxxxxx`` | Character with 32-bit hex value | \(2) |
|
||||
| | *xxxxxxxx* (Unicode only) | |
|
||||
+-----------------+---------------------------------+-------+
|
||||
| ``\v`` | ASCII Vertical Tab (VT) | |
|
||||
+-----------------+---------------------------------+-------+
|
||||
| ``\ooo`` | Character with octal value | (3,5) |
|
||||
| ``\ooo`` | Character with octal value | (1,3) |
|
||||
| | *ooo* | |
|
||||
+-----------------+---------------------------------+-------+
|
||||
| ``\xhh`` | Character with hex value *hh* | (4,5) |
|
||||
| ``\xhh`` | Character with hex value *hh* | (2,3) |
|
||||
+-----------------+---------------------------------+-------+
|
||||
|
||||
.. index:: single: ASCII@ASCII
|
||||
Escape sequences only recognized in string literals are:
|
||||
|
||||
+-----------------+---------------------------------+-------+
|
||||
| Escape Sequence | Meaning | Notes |
|
||||
+=================+=================================+=======+
|
||||
| ``\N{name}`` | Character named *name* in the | |
|
||||
| | Unicode database | |
|
||||
+-----------------+---------------------------------+-------+
|
||||
| ``\uxxxx`` | Character with 16-bit hex value | \(4) |
|
||||
| | *xxxx* | |
|
||||
+-----------------+---------------------------------+-------+
|
||||
| ``\Uxxxxxxxx`` | Character with 32-bit hex value | \(5) |
|
||||
| | *xxxxxxxx* | |
|
||||
+-----------------+---------------------------------+-------+
|
||||
|
||||
Notes:
|
||||
|
||||
(1)
|
||||
As in Standard C, up to three octal digits are accepted.
|
||||
|
||||
(2)
|
||||
Unlike in Standard C, at most two hex digits are accepted.
|
||||
|
||||
(3)
|
||||
In a bytes literal, hexadecimal and octal escapes denote the byte with the
|
||||
given value. In a string literal, these escapes denote a Unicode character
|
||||
with the given value.
|
||||
|
||||
(4)
|
||||
Individual code units which form parts of a surrogate pair can be encoded using
|
||||
this escape sequence.
|
||||
|
||||
(2)
|
||||
(5)
|
||||
Any Unicode character can be encoded this way, but characters outside the Basic
|
||||
Multilingual Plane (BMP) will be encoded using a surrogate pair if Python is
|
||||
compiled to use 16-bit code units (the default). Individual code units which
|
||||
form parts of a surrogate pair can be encoded using this escape sequence.
|
||||
|
||||
(3)
|
||||
As in Standard C, up to three octal digits are accepted.
|
||||
|
||||
(4)
|
||||
Unlike in Standard C, at most two hex digits are accepted.
|
||||
|
||||
(5)
|
||||
In a string literal, hexadecimal and octal escapes denote the byte with the
|
||||
given value; it is not necessary that the byte encodes a character in the source
|
||||
character set. In a Unicode literal, these escapes denote a Unicode character
|
||||
with the given value.
|
||||
|
||||
.. index:: single: unrecognized escape sequence
|
||||
.. index:: unrecognized escape sequence
|
||||
|
||||
Unlike Standard C, all unrecognized escape sequences are left in the string
|
||||
unchanged, i.e., *the backslash is left in the string*. (This behavior is
|
||||
useful when debugging: if an escape sequence is mistyped, the resulting output
|
||||
is more easily recognized as broken.) It is also important to note that the
|
||||
escape sequences marked as "(Unicode only)" in the table above fall into the
|
||||
category of unrecognized escapes for non-Unicode string literals.
|
||||
escape sequences only recognized in string literals fall into the category of
|
||||
unrecognized escapes for bytes literals.
|
||||
|
||||
When an ``'r'`` or ``'R'`` prefix is present, a character following a backslash
|
||||
is included in the string without change, and *all backslashes are left in the
|
||||
string*. For example, the string literal ``r"\n"`` consists of two characters:
|
||||
a backslash and a lowercase ``'n'``. String quotes can be escaped with a
|
||||
backslash, but the backslash remains in the string; for example, ``r"\""`` is a
|
||||
valid string literal consisting of two characters: a backslash and a double
|
||||
quote; ``r"\"`` is not a valid string literal (even a raw string cannot end in
|
||||
an odd number of backslashes). Specifically, *a raw string cannot end in a
|
||||
single backslash* (since the backslash would escape the following quote
|
||||
character). Note also that a single backslash followed by a newline is
|
||||
interpreted as those two characters as part of the string, *not* as a line
|
||||
continuation.
|
||||
When an ``'r'`` or ``'R'`` prefix is used in a string literal, then the
|
||||
``\uXXXX`` and ``\UXXXXXXXX`` escape sequences are processed while *all other
|
||||
backslashes are left in the string*. For example, the string literal
|
||||
``r"\u0062\n"`` consists of three Unicode characters: 'LATIN SMALL LETTER B',
|
||||
'REVERSE SOLIDUS', and 'LATIN SMALL LETTER N'. Backslashes can be escaped with a
|
||||
preceding backslash; however, both remain in the string. As a result,
|
||||
``\uXXXX`` escape sequences are only recognized when there is an odd number of
|
||||
backslashes.
|
||||
|
||||
When an ``'r'`` or ``'R'`` prefix is used in conjunction with a ``'u'`` or
|
||||
``'U'`` prefix, then the ``\uXXXX`` and ``\UXXXXXXXX`` escape sequences are
|
||||
processed while *all other backslashes are left in the string*. For example,
|
||||
the string literal ``ur"\u0062\n"`` consists of three Unicode characters: 'LATIN
|
||||
SMALL LETTER B', 'REVERSE SOLIDUS', and 'LATIN SMALL LETTER N'. Backslashes can
|
||||
be escaped with a preceding backslash; however, both remain in the string. As a
|
||||
result, ``\uXXXX`` escape sequences are only recognized when there are an odd
|
||||
number of backslashes.
|
||||
Even in a raw string, string quotes can be escaped with a backslash, but the
|
||||
backslash remains in the string; for example, ``r"\""`` is a valid string
|
||||
literal consisting of two characters: a backslash and a double quote; ``r"\"``
|
||||
is not a valid string literal (even a raw string cannot end in an odd number of
|
||||
backslashes). Specifically, *a raw string cannot end in a single backslash*
|
||||
(since the backslash would escape the following quote character). Note also
|
||||
that a single backslash followed by a newline is interpreted as those two
|
||||
characters as part of the string, *not* as a line continuation.
|
||||
|
||||
|
||||
.. _string-catenation:
|
||||
|
|
@ -600,19 +567,9 @@ styles for each component (even mixing raw strings and triple quoted strings).
|
|||
Numeric literals
|
||||
----------------
|
||||
|
||||
.. index::
|
||||
single: number
|
||||
single: numeric literal
|
||||
single: integer literal
|
||||
single: plain integer literal
|
||||
single: long integer literal
|
||||
single: floating point literal
|
||||
single: hexadecimal literal
|
||||
single: octal literal
|
||||
single: binary literal
|
||||
single: decimal literal
|
||||
single: imaginary literal
|
||||
single: complex; literal
|
||||
.. index:: number, numeric literal, integer literal, plain integer literal
|
||||
long integer literal, floating point literal, hexadecimal literal
|
||||
octal literal, binary literal, decimal literal, imaginary literal, complex literal
|
||||
|
||||
There are four types of numeric literals: plain integers, long integers,
|
||||
floating point numbers, and imaginary numbers. There are no complex literals
|
||||
|
|
@ -633,18 +590,17 @@ Integer literals are described by the following lexical definitions:
|
|||
.. productionlist::
|
||||
integer: `decimalinteger` | `octinteger` | `hexinteger`
|
||||
decimalinteger: `nonzerodigit` `digit`* | "0"+
|
||||
nonzerodigit: "1"..."9"
|
||||
digit: "0"..."9"
|
||||
octinteger: "0" ("o" | "O") `octdigit`+
|
||||
hexinteger: "0" ("x" | "X") `hexdigit`+
|
||||
bininteger: "0" ("b" | "B") `bindigit`+
|
||||
nonzerodigit: "1"..."9"
|
||||
octdigit: "0"..."7"
|
||||
hexdigit: `digit` | "a"..."f" | "A"..."F"
|
||||
bindigit: "0"..."1"
|
||||
bindigit: "0" | "1"
|
||||
|
||||
Plain integer literals that are above the largest representable plain integer
|
||||
(e.g., 2147483647 when using 32-bit arithmetic) are accepted as if they were
|
||||
long integers instead. [#]_ There is no limit for long integer literals apart
|
||||
from what can be stored in available memory.
|
||||
There is no limit for the length of integer literals apart from what can be
|
||||
stored in available memory.
|
||||
|
||||
Note that leading zeros in a non-zero decimal number are not allowed. This is
|
||||
for disambiguation with C-style octal literals, which Python used before version
|
||||
|
|
@ -732,7 +688,7 @@ The following tokens serve as delimiters in the grammar::
|
|||
&= |= ^= >>= <<= **=
|
||||
|
||||
The period can also occur in floating-point and imaginary literals. A sequence
|
||||
of three periods has a special meaning as an ellipsis in slices. The second half
|
||||
of three periods has a special meaning as an ellipsis literal. The second half
|
||||
of the list, the augmented assignment operators, serve lexically as delimiters,
|
||||
but also perform an operation.
|
||||
|
||||
|
|
@ -741,18 +697,7 @@ tokens or are otherwise significant to the lexical analyzer::
|
|||
|
||||
' " # \
|
||||
|
||||
.. index:: single: ASCII@ASCII
|
||||
|
||||
The following printing ASCII characters are not used in Python. Their
|
||||
occurrence outside string literals and comments is an unconditional error::
|
||||
|
||||
$ ?
|
||||
|
||||
.. rubric:: Footnotes
|
||||
|
||||
.. [#] In versions of Python prior to 2.4, octal and hexadecimal literals in the range
|
||||
just above the largest representable plain integer but below the largest
|
||||
unsigned 32-bit number (on a machine using 32-bit arithmetic), 4294967296, were
|
||||
taken as the negative plain integer obtained by subtracting 4294967296 from
|
||||
their unsigned value.
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue