Update the first two parts of the reference manual for Py3k,

mainly concerning PEPs 3131 and 3120.
2025-12-23 09:19:18 +00:00 · 2007-08-31 08:07:45 +00:00 · 2007-08-31 08:07:45 +00:00 · 57e3b68c22
commit 57e3b68c22
parent 3dc33d1845
3 changed files with 173 additions and 232 deletions
--- a/Doc/documenting/index.rst
+++ b/Doc/documenting/index.rst
@ -27,6 +27,7 @@ are more than welcome as well.
   style.rst
   rest.rst
   markup.rst
+   fromlatex.rst
   sphinx.rst

 .. XXX add credits, thanks etc.
--- a/Doc/reference/introduction.rst
+++ b/Doc/reference/introduction.rst
@ -22,11 +22,12 @@ language, maybe you could volunteer your time --- or invent a cloning machine

 It is dangerous to add too many implementation details to a language reference
 document --- the implementation may change, and other implementations of the
-same language may work differently.  On the other hand, there is currently only
-one Python implementation in widespread use (although alternate implementations
-exist), and its particular quirks are sometimes worth being mentioned,
-especially where the implementation imposes additional limitations.  Therefore,
-you'll find short "implementation notes" sprinkled throughout the text.
+same language may work differently.  On the other hand, CPython is the one
+Python implementation in widespread use (although alternate implementations
+continue to gain support), and its particular quirks are sometimes worth being
+mentioned, especially where the implementation imposes additional limitations.
+Therefore, you'll find short "implementation notes" sprinkled throughout the
+text.

 Every Python implementation comes with a number of built-in and standard
 modules.  These are documented in :ref:`library-index`.  A few built-in modules
@ -88,11 +89,7 @@ implementation you're using.
 Notation
 ========

-.. index::
-   single: BNF
-   single: grammar
-   single: syntax
-   single: notation
+.. index:: BNF, grammar, syntax, notation

 The descriptions of lexical analysis and syntax use a modified BNF grammar
 notation.  This uses the following style of definition:
@ -118,9 +115,7 @@ meaningful to separate tokens. Rules are normally contained on a single line;
 rules with many alternatives may be formatted alternatively with each line after
 the first beginning with a vertical bar.

-.. index::
-   single: lexical definitions
-   single: ASCII@ASCII
+.. index:: lexical definitions, ASCII

 In lexical definitions (as the example above), two more conventions are used:
 Two literal characters separated by three dots mean a choice of any single
--- a/Doc/reference/lexical_analysis.rst
+++ b/Doc/reference/lexical_analysis.rst
@ -5,38 +5,16 @@
 Lexical analysis
 ****************

-.. index::
-   single: lexical analysis
-   single: parser
-   single: token
+.. index:: lexical analysis, parser, token

 A Python program is read by a *parser*.  Input to the parser is a stream of
 *tokens*, generated by the *lexical analyzer*.  This chapter describes how the
 lexical analyzer breaks a file into tokens.

-Python uses the 7-bit ASCII character set for program text.
-
-.. versionadded:: 2.3
-   An encoding declaration can be used to indicate that  string literals and
-   comments use an encoding different from ASCII.
-
-For compatibility with older versions, Python only warns if it finds 8-bit
-characters; those warnings should be corrected by either declaring an explicit
-encoding, or using escape sequences if those bytes are binary data, instead of
-characters.
-
-The run-time character set depends on the I/O devices connected to the program
-but is generally a superset of ASCII.
-
-**Future compatibility note:** It may be tempting to assume that the character
-set for 8-bit characters is ISO Latin-1 (an ASCII superset that covers most
-western languages that use the Latin alphabet), but it is possible that in the
-future Unicode text editors will become common.  These generally use the UTF-8
-encoding, which is also an ASCII superset, but with very different use for the
-characters with ordinals 128-255.  While there is no consensus on this subject
-yet, it is unwise to assume either Latin-1 or UTF-8, even though the current
-implementation appears to favor Latin-1.  This applies both to the source
-character set and the run-time character set.
+Python reads program text as Unicode code points; the encoding of a source file
+can be given by an encoding declaration and defaults to UTF-8, see :pep:`3120`
+for details.  If the source file cannot be decoded, a :exc:`SyntaxError` is
+raised.


 .. _line-structure:
@ -44,21 +22,17 @@ character set and the run-time character set.
 Line structure
 ==============

-.. index:: single: line structure
+.. index:: line structure

 A Python program is divided into a number of *logical lines*.


-.. _logical:
+.. _logical-lines:

 Logical lines
 -------------

-.. index::
-   single: logical line
-   single: physical line
-   single: line joining
-   single: NEWLINE token
+.. index:: logical line, physical line, line joining, NEWLINE token

 The end of a logical line is represented by the token NEWLINE.  Statements
 cannot cross logical line boundaries except where NEWLINE is allowed by the
@ -67,7 +41,7 @@ constructed from one or more *physical lines* by following the explicit or
 implicit *line joining* rules.


-.. _physical:
+.. _physical-lines:

 Physical lines
 --------------
@ -89,9 +63,7 @@ representing ASCII LF, is the line terminator).
 Comments
 --------

-.. index::
-   single: comment
-   single: hash character
+.. index:: comment, hash character

 A comment starts with a hash character (``#``) that is not part of a string
 literal, and ends at the end of the physical line.  A comment signifies the end
@ -104,9 +76,7 @@ are ignored by the syntax; they are not tokens.
 Encoding declarations
 ---------------------

-.. index::
-   single: source character set
-   single: encodings
+.. index:: source character set, encodings

 If a comment in the first or second line of the Python script matches the
 regular expression ``coding[=:]\s*([-\w.]+)``, this comment is processed as an
@ -119,19 +89,19 @@ which is recognized also by GNU Emacs, and ::

   # vim:fileencoding=<encoding-name>

-which is recognized by Bram Moolenaar's VIM. In addition, if the first bytes of
-the file are the UTF-8 byte-order mark (``'\xef\xbb\xbf'``), the declared file
-encoding is UTF-8 (this is supported, among others, by Microsoft's
-:program:`notepad`).
+which is recognized by Bram Moolenaar's VIM.
+
+If no encoding declaration is found, the default encoding is UTF-8.  In
+addition, if the first bytes of the file are the UTF-8 byte-order mark
+(``b'\xef\xbb\xbf'``), the declared file encoding is UTF-8 (this is supported,
+among others, by Microsoft's :program:`notepad`).

 If an encoding is declared, the encoding name must be recognized by Python. The
-encoding is used for all lexical analysis, in particular to find the end of a
-string, and to interpret the contents of Unicode literals. String literals are
-converted to Unicode for syntactical analysis, then converted back to their
-original encoding before interpretation starts. The encoding declaration must
-appear on a line of its own.
+encoding is used for all lexical analysis, including string literals, comments
+and identifiers. The encoding declaration must appear on a line of its own.

-.. % XXX there should be a list of supported encodings.
+A list of standard encodings can be found in the section
+:ref:`standard-encodings`.


 .. _explicit-joining:
@ -139,21 +109,13 @@ appear on a line of its own.
 Explicit line joining
 ---------------------

-.. index::
-   single: physical line
-   single: line joining
-   single: line continuation
-   single: backslash character
+.. index:: physical line, line joining, line continuation, backslash character

 Two or more physical lines may be joined into logical lines using backslash
 characters (``\``), as follows: when a physical line ends in a backslash that is
 not part of a string literal or comment, it is joined with the following forming
 a single logical line, deleting the backslash and the following end-of-line
-character.  For example:
-
-.. % 
-
-::
+character.  For example::

   if 1900 < year < 2100 and 1 <= month <= 12 \
      and 1 <= day <= 31 and 0 <= hour < 24 \
@ -197,9 +159,9 @@ Blank lines
 A logical line that contains only spaces, tabs, formfeeds and possibly a
 comment, is ignored (i.e., no NEWLINE token is generated).  During interactive
 input of statements, handling of a blank line may differ depending on the
-implementation of the read-eval-print loop.  In the standard implementation, an
-entirely blank logical line (i.e. one containing not even whitespace or a
-comment) terminates a multi-line statement.
+implementation of the read-eval-print loop.  In the standard interactive
+interpreter, an entirely blank logical line (i.e. one containing not even
+whitespace or a comment) terminates a multi-line statement.


 .. _indentation:
@ -207,14 +169,7 @@ comment) terminates a multi-line statement.
 Indentation
 -----------

-.. index::
-   single: indentation
-   single: whitespace
-   single: leading whitespace
-   single: space
-   single: tab
-   single: grouping
-   single: statement grouping
+.. index:: indentation, leading whitespace, space, tab, grouping, statement grouping

 Leading whitespace (spaces and tabs) at the beginning of a logical line is used
 to compute the indentation level of the line, which in turn is used to determine
@ -238,9 +193,7 @@ for the indentation calculations above.  Formfeed characters occurring elsewhere
 in the leading whitespace have an undefined effect (for instance, they may reset
 the space count to zero).

-.. index::
-   single: INDENT token
-   single: DEDENT token
+.. index:: INDENT token, DEDENT token

 The indentation levels of consecutive lines are used to generate INDENT and
 DEDENT tokens, using a stack, as follows.
@ -315,22 +268,48 @@ possible string that forms a legal token, when read from left to right.
 Identifiers and keywords
 ========================

-.. index::
-   single: identifier
-   single: name
+.. index:: identifier, name

 Identifiers (also referred to as *names*) are described by the following lexical
 definitions:

-.. productionlist::
-   identifier: (`letter`|"_") (`letter` | `digit` | "_")*
-   letter: `lowercase` | `uppercase`
-   lowercase: "a"..."z"
-   uppercase: "A"..."Z"
-   digit: "0"..."9"
+The syntax of identifiers in Python is based on the Unicode standard annex
+UAX-31, with elaboration and changes as defined below.
+
+Within the ASCII range (U+0001..U+007F), the valid characters for identifiers
+are the same as in Python 2.5; Python 3.0 introduces additional
+characters from outside the ASCII range (see :pep:`3131`).  For other
+characters, the classification uses the version of the Unicode Character
+Database as included in the :mod:`unicodedata` module.

 Identifiers are unlimited in length.  Case is significant.

+.. productionlist::
+   identifier: `id_start` `id_continue`*
+   id_start: <all characters in general categories Lu, Ll, Lt, Lm, Lo, Nl,
+              the underscore, and characters with the Other_ID_Start property>
+   id_continue: <all characters in `id_start`, plus characters in the categories
+                 Mn, Mc, Nd, Pc and others with the Other_ID_Continue property>
+
+The Unicode category codes mentioned above stand for:
+
+* *Lu* - uppercase letters
+* *Ll* - lowercase letters
+* *Lt* - titlecase letters
+* *Lm* - modifier letters
+* *Lo* - other letters
+* *Nl* - letter numbers
+* *Mn* - nonspacing marks
+* *Mc* - spacing combining marks
+* *Nd* - decimal numbers
+* *Pc* - connector punctuations
+
+All identifiers are converted into the normal form NFC while parsing; comparison
+of identifiers is based on NFC.
+
+A non-normative HTML file listing all valid identifier characters for Unicode
+4.1 can be found at
+http://www.dcl.hpi.uni-potsdam.de/home/loewis/table-3131.html.

 .. _keywords:

@ -345,25 +324,13 @@ The following identifiers are used as reserved words, or *keywords* of the
 language, and cannot be used as ordinary identifiers.  They must be spelled
 exactly as written here::

-   and       def       for       is        raise
-   as        del       from      lambda    return
-   assert    elif      global    not       try
-   break     else      if        or        while
-   class     except    import    pass      with
-   continue  finally   in        print     yield
-
-.. versionchanged:: 2.4
-   :const:`None` became a constant and is now recognized by the compiler as a name
-   for the built-in object :const:`None`.  Although it is not a keyword, you cannot
-   assign a different object to it.
-
-.. versionchanged:: 2.5
-   Both :keyword:`as` and :keyword:`with` are only recognized when the
-   ``with_statement`` future feature has been enabled. It will always be enabled in
-   Python 2.6.  See section :ref:`with` for details.  Note that using :keyword:`as`
-   and :keyword:`with` as identifiers will always issue a warning, even when the
-   ``with_statement`` future directive is not in effect.
-
+   False      class      finally    is         return
+   None       continue   for        lambda     try
+   True       def        from       nonlocal   while
+   and        del        global     not        with
+   as         elif       if         or         yield
+   assert     else       import     pass
+   break      except     in         raise

 .. _id-classes:

@ -405,71 +372,71 @@ characters:
 Literals
 ========

-.. index::
-   single: literal
-   single: constant
+.. index:: literal, constant

 Literals are notations for constant values of some built-in types.


 .. _strings:

-String literals
---------------
+String and Bytes literals
+-------------------------

-.. index:: single: string literal
+.. index:: string literal, bytes literal, ASCII

 String literals are described by the following lexical definitions:

-.. index:: single: ASCII@ASCII
-
 .. productionlist::
   stringliteral: [`stringprefix`](`shortstring` | `longstring`)
-   stringprefix: "r" | "u" | "ur" | "R" | "U" | "UR" | "Ur" | "uR"
+   stringprefix: "r" | "R"
   shortstring: "'" `shortstringitem`* "'" | '"' `shortstringitem`* '"'
-   longstring: ""'" `longstringitem`* ""'"
-             : | '"""' `longstringitem`* '"""'
-   shortstringitem: `shortstringchar` | `escapeseq`
-   longstringitem: `longstringchar` | `escapeseq`
+   longstring: "'''" `longstringitem`* "'''" | '"""' `longstringitem`* '"""'
+   shortstringitem: `shortstringchar` | `stringescapeseq`
+   longstringitem: `longstringchar` | `stringescapeseq`
   shortstringchar: <any source character except "\" or newline or the quote>
   longstringchar: <any source character except "\">
-   escapeseq: "\" <any ASCII character>
+   stringescapeseq: "\" <any source character>
+
+.. productionlist::
+   bytesliteral: `bytesprefix`(`shortbytes` | `longbytes`)
+   bytesprefix: "b" | "B"
+   shortbytes: "'" `shortbytesitem`* "'" | '"' `shortbytesitem`* '"'
+   longbytes: "'''" `longbytesitem`* "'''" | '"""' `longbytesitem`* '"""'
+   shortbytesitem: `shortbyteschar` | `bytesescapeseq`
+   longbytesitem: `longbyteschar` | `bytesescapeseq`
+   shortbyteschar: <any ASCII character except "\" or newline or the quote>
+   longbyteschar: <any ASCII character except "\">
+   bytesescapeseq: "\" <any ASCII character>

 One syntactic restriction not indicated by these productions is that whitespace
-is not allowed between the :token:`stringprefix` and the rest of the string
-literal. The source character set is defined by the encoding declaration; it is
-ASCII if no encoding declaration is given in the source file; see section
-:ref:`encodings`.
+is not allowed between the :token:`stringprefix` or :token:`bytesprefix` and the
+rest of the literal. The source character set is defined by the encoding
+declaration; it is UTF-8 if no encoding declaration is given in the source file;
+see section :ref:`encodings`.

-.. index::
-   single: triple-quoted string
-   single: Unicode Consortium
-   single: string; Unicode
-   single: raw string
+.. index:: triple-quoted string, Unicode Consortium, raw string

-In plain English: String literals can be enclosed in matching single quotes
+In plain English: Both types of literals can be enclosed in matching single quotes
 (``'``) or double quotes (``"``).  They can also be enclosed in matching groups
 of three single or double quotes (these are generally referred to as
 *triple-quoted strings*).  The backslash (``\``) character is used to escape
 characters that otherwise have a special meaning, such as newline, backslash
-itself, or the quote character.  String literals may optionally be prefixed with
-a letter ``'r'`` or ``'R'``; such strings are called :dfn:`raw strings` and use
-different rules for interpreting backslash escape sequences.  A prefix of
-``'u'`` or ``'U'`` makes the string a Unicode string.  Unicode strings use the
-Unicode character set as defined by the Unicode Consortium and ISO 10646.  Some
-additional escape sequences, described below, are available in Unicode strings.
-The two prefix characters may be combined; in this case, ``'u'`` must appear
-before ``'r'``.
+itself, or the quote character.
+
+String literals may optionally be prefixed with a letter ``'r'`` or ``'R'``;
+such strings are called :dfn:`raw strings` and use different rules for
+interpreting backslash escape sequences.
+
+Bytes literals are always prefixed with ``'b'`` or ``'B'``; they produce an
+instance of the :class:`bytes` type instead of the :class:`str` type.  They
+may only contain ASCII characters; bytes with a numeric value of 128 or greater
+must be expressed with escapes.

 In triple-quoted strings, unescaped newlines and quotes are allowed (and are
 retained), except that three unescaped quotes in a row terminate the string.  (A
 "quote" is the character used to open the string, i.e. either ``'`` or ``"``.)

-.. index::
-   single: physical line
-   single: escape sequence
-   single: Standard C
-   single: C
+.. index:: physical line, escape sequence, Standard C, C

 Unless an ``'r'`` or ``'R'`` prefix is present, escape sequences in strings are
 interpreted according to rules similar to those used by Standard C.  The
@ -478,7 +445,7 @@ recognized escape sequences are:
 +-----------------+---------------------------------+-------+
 | Escape Sequence | Meaning                         | Notes |
 +=================+=================================+=======+
-| ``\newline``    | Ignored                         |       |
+| ``\newline``    | Backslash and newline ignored   |       |
 +-----------------+---------------------------------+-------+
 | ``\\``          | Backslash (``\``)               |       |
 +-----------------+---------------------------------+-------+
@ -494,83 +461,83 @@ recognized escape sequences are:
 +-----------------+---------------------------------+-------+
 | ``\n``          | ASCII Linefeed (LF)             |       |
 +-----------------+---------------------------------+-------+
-| ``\N{name}``    | Character named *name* in the   |       |
-|                 | Unicode database (Unicode only) |       |
-+-----------------+---------------------------------+-------+
 | ``\r``          | ASCII Carriage Return (CR)      |       |
 +-----------------+---------------------------------+-------+
 | ``\t``          | ASCII Horizontal Tab (TAB)      |       |
 +-----------------+---------------------------------+-------+
-| ``\uxxxx``      | Character with 16-bit hex value | \(1)  |
-|                 | *xxxx* (Unicode only)           |       |
-+-----------------+---------------------------------+-------+
-| ``\Uxxxxxxxx``  | Character with 32-bit hex value | \(2)  |
-|                 | *xxxxxxxx* (Unicode only)       |       |
-+-----------------+---------------------------------+-------+
 | ``\v``          | ASCII Vertical Tab (VT)         |       |
 +-----------------+---------------------------------+-------+
-| ``\ooo``        | Character with octal value      | (3,5) |
+| ``\ooo``        | Character with octal value      | (1,3) |
 |                 | *ooo*                           |       |
 +-----------------+---------------------------------+-------+
-| ``\xhh``        | Character with hex value *hh*   | (4,5) |
+| ``\xhh``        | Character with hex value *hh*   | (2,3) |
 +-----------------+---------------------------------+-------+

-.. index:: single: ASCII@ASCII
+Escape sequences only recognized in string literals are:
+
+-----------------+---------------------------------+-------+
+| Escape Sequence | Meaning                         | Notes |
+=================+=================================+=======+
+| ``\N{name}``    | Character named *name* in the   |       |
+|                 | Unicode database                |       |
+-----------------+---------------------------------+-------+
+| ``\uxxxx``      | Character with 16-bit hex value | \(4)  |
+|                 | *xxxx*                          |       |
+-----------------+---------------------------------+-------+
+| ``\Uxxxxxxxx``  | Character with 32-bit hex value | \(5)  |
+|                 | *xxxxxxxx*                      |       |
+-----------------+---------------------------------+-------+

 Notes:

 (1)
+   As in Standard C, up to three octal digits are accepted.
+
+(2)
+   Unlike in Standard C, at most two hex digits are accepted.
+
+(3)
+   In a bytes literal, hexadecimal and octal escapes denote the byte with the
+   given value. In a string literal, these escapes denote a Unicode character
+   with the given value.
+
+(4)
   Individual code units which form parts of a surrogate pair can be encoded using
   this escape sequence.

-(2)
+(5)
   Any Unicode character can be encoded this way, but characters outside the Basic
   Multilingual Plane (BMP) will be encoded using a surrogate pair if Python is
   compiled to use 16-bit code units (the default).  Individual code units which
   form parts of a surrogate pair can be encoded using this escape sequence.

-(3)
-   As in Standard C, up to three octal digits are accepted.

-(4)
-   Unlike in Standard C, at most two hex digits are accepted.
-
-(5)
-   In a string literal, hexadecimal and octal escapes denote the byte with the
-   given value; it is not necessary that the byte encodes a character in the source
-   character set. In a Unicode literal, these escapes denote a Unicode character
-   with the given value.
-
-.. index:: single: unrecognized escape sequence
+.. index:: unrecognized escape sequence

 Unlike Standard C, all unrecognized escape sequences are left in the string
 unchanged, i.e., *the backslash is left in the string*.  (This behavior is
 useful when debugging: if an escape sequence is mistyped, the resulting output
 is more easily recognized as broken.)  It is also important to note that the
-escape sequences marked as "(Unicode only)" in the table above fall into the
-category of unrecognized escapes for non-Unicode string literals.
+escape sequences only recognized in string literals fall into the category of
+unrecognized escapes for bytes literals.

-When an ``'r'`` or ``'R'`` prefix is present, a character following a backslash
-is included in the string without change, and *all backslashes are left in the
-string*.  For example, the string literal ``r"\n"`` consists of two characters:
-a backslash and a lowercase ``'n'``.  String quotes can be escaped with a
-backslash, but the backslash remains in the string; for example, ``r"\""`` is a
-valid string literal consisting of two characters: a backslash and a double
-quote; ``r"\"`` is not a valid string literal (even a raw string cannot end in
-an odd number of backslashes).  Specifically, *a raw string cannot end in a
-single backslash* (since the backslash would escape the following quote
-character).  Note also that a single backslash followed by a newline is
-interpreted as those two characters as part of the string, *not* as a line
-continuation.
+When an ``'r'`` or ``'R'`` prefix is used in a string literal, then the
+``\uXXXX`` and ``\UXXXXXXXX`` escape sequences are processed while *all other
+backslashes are left in the string*. For example, the string literal
+``r"\u0062\n"`` consists of three Unicode characters: 'LATIN SMALL LETTER B',
+'REVERSE SOLIDUS', and 'LATIN SMALL LETTER N'. Backslashes can be escaped with a
+preceding backslash; however, both remain in the string.  As a result,
+``\uXXXX`` escape sequences are only recognized when there is an odd number of
+backslashes.

-When an ``'r'`` or ``'R'`` prefix is used in conjunction with a ``'u'`` or
-``'U'`` prefix, then the ``\uXXXX`` and ``\UXXXXXXXX`` escape sequences are
-processed while  *all other backslashes are left in the string*. For example,
-the string literal ``ur"\u0062\n"`` consists of three Unicode characters: 'LATIN
-SMALL LETTER B', 'REVERSE SOLIDUS', and 'LATIN SMALL LETTER N'. Backslashes can
-be escaped with a preceding backslash; however, both remain in the string.  As a
-result, ``\uXXXX`` escape sequences are only recognized when there are an odd
-number of backslashes.
+Even in a raw string, string quotes can be escaped with a backslash, but the
+backslash remains in the string; for example, ``r"\""`` is a valid string
+literal consisting of two characters: a backslash and a double quote; ``r"\"``
+is not a valid string literal (even a raw string cannot end in an odd number of
+backslashes).  Specifically, *a raw string cannot end in a single backslash*
+(since the backslash would escape the following quote character).  Note also
+that a single backslash followed by a newline is interpreted as those two
+characters as part of the string, *not* as a line continuation.


 .. _string-catenation:
@ -600,19 +567,9 @@ styles for each component (even mixing raw strings and triple quoted strings).
 Numeric literals
 ----------------

-.. index::
-   single: number
-   single: numeric literal
-   single: integer literal
-   single: plain integer literal
-   single: long integer literal
-   single: floating point literal
-   single: hexadecimal literal
-   single: octal literal
-   single: binary literal
-   single: decimal literal
-   single: imaginary literal
-   single: complex; literal
+.. index:: number, numeric literal, integer literal, plain integer literal
+   long integer literal, floating point literal, hexadecimal literal
+   octal literal, binary literal, decimal literal, imaginary literal, complex literal

 There are four types of numeric literals: plain integers, long integers,
 floating point numbers, and imaginary numbers.  There are no complex literals
@ -633,18 +590,17 @@ Integer literals are described by the following lexical definitions:
 .. productionlist::
   integer: `decimalinteger` | `octinteger` | `hexinteger`
   decimalinteger: `nonzerodigit` `digit`* | "0"+
+   nonzerodigit: "1"..."9"
+   digit: "0"..."9"
   octinteger: "0" ("o" | "O") `octdigit`+
   hexinteger: "0" ("x" | "X") `hexdigit`+
   bininteger: "0" ("b" | "B") `bindigit`+
-   nonzerodigit: "1"..."9"
   octdigit: "0"..."7"
   hexdigit: `digit` | "a"..."f" | "A"..."F"
-   bindigit: "0"..."1"
+   bindigit: "0" | "1"

-Plain integer literals that are above the largest representable plain integer
-(e.g., 2147483647 when using 32-bit arithmetic) are accepted as if they were
-long integers instead. [#]_  There is no limit for long integer literals apart
-from what can be stored in available memory.
+There is no limit for the length of integer literals apart from what can be
+stored in available memory.

 Note that leading zeros in a non-zero decimal number are not allowed. This is
 for disambiguation with C-style octal literals, which Python used before version
@ -732,7 +688,7 @@ The following tokens serve as delimiters in the grammar::
   &=      |=      ^=      >>=     <<=     **=

 The period can also occur in floating-point and imaginary literals.  A sequence
-of three periods has a special meaning as an ellipsis in slices. The second half
+of three periods has a special meaning as an ellipsis literal. The second half
 of the list, the augmented assignment operators, serve lexically as delimiters,
 but also perform an operation.

@ -741,18 +697,7 @@ tokens or are otherwise significant to the lexical analyzer::

   '       "       #       \

-.. index:: single: ASCII@ASCII
-
 The following printing ASCII characters are not used in Python.  Their
 occurrence outside string literals and comments is an unconditional error::

   $       ?
-
-.. rubric:: Footnotes
-
-.. [#] In versions of Python prior to 2.4, octal and hexadecimal literals in the range
-   just above the largest representable plain integer but below the largest
-   unsigned 32-bit number (on a machine using 32-bit arithmetic), 4294967296, were
-   taken as the negative plain integer obtained by subtracting 4294967296 from
-   their unsigned value.
-