mirror of
https://github.com/python/cpython.git
synced 2025-11-02 19:12:55 +00:00
bpo-25324: Move the description of tokenize tokens to token.rst. (#1911)
This commit is contained in:
parent
6260d9f203
commit
5cefb6cfdd
2 changed files with 39 additions and 39 deletions
|
|
@ -101,18 +101,37 @@ The token constants are:
|
||||||
AWAIT
|
AWAIT
|
||||||
ASYNC
|
ASYNC
|
||||||
ERRORTOKEN
|
ERRORTOKEN
|
||||||
COMMENT
|
|
||||||
NL
|
|
||||||
ENCODING
|
|
||||||
N_TOKENS
|
N_TOKENS
|
||||||
NT_OFFSET
|
NT_OFFSET
|
||||||
|
|
||||||
.. versionchanged:: 3.5
|
|
||||||
Added :data:`AWAIT` and :data:`ASYNC` tokens. Starting with
|
|
||||||
Python 3.7, "async" and "await" will be tokenized as :data:`NAME`
|
|
||||||
tokens, and :data:`AWAIT` and :data:`ASYNC` will be removed.
|
|
||||||
|
|
||||||
.. versionchanged:: 3.7
|
The following token type values aren't used by the C tokenizer but are needed for
|
||||||
Added :data:`COMMENT`, :data:`NL` and :data:`ENCODING` to bring
|
the :mod:`tokenize` module.
|
||||||
the tokens in the C code in line with the tokens needed in
|
|
||||||
:mod:`tokenize` module. These tokens aren't used by the C tokenizer.
|
.. data:: COMMENT
|
||||||
|
|
||||||
|
Token value used to indicate a comment.
|
||||||
|
|
||||||
|
|
||||||
|
.. data:: NL
|
||||||
|
|
||||||
|
Token value used to indicate a non-terminating newline. The
|
||||||
|
:data:`NEWLINE` token indicates the end of a logical line of Python code;
|
||||||
|
``NL`` tokens are generated when a logical line of code is continued over
|
||||||
|
multiple physical lines.
|
||||||
|
|
||||||
|
|
||||||
|
.. data:: ENCODING
|
||||||
|
|
||||||
|
Token value that indicates the encoding used to decode the source bytes
|
||||||
|
into text. The first token returned by :func:`tokenize.tokenize` will
|
||||||
|
always be an ``ENCODING`` token.
|
||||||
|
|
||||||
|
|
||||||
|
.. versionchanged:: 3.5
|
||||||
|
Added :data:`AWAIT` and :data:`ASYNC` tokens. Starting with
|
||||||
|
Python 3.7, "async" and "await" will be tokenized as :data:`NAME`
|
||||||
|
tokens, and :data:`AWAIT` and :data:`ASYNC` will be removed.
|
||||||
|
|
||||||
|
.. versionchanged:: 3.7
|
||||||
|
Added :data:`COMMENT`, :data:`NL` and :data:`ENCODING` tokens.
|
||||||
|
|
|
||||||
|
|
@ -17,7 +17,7 @@ as well, making it useful for implementing "pretty-printers," including
|
||||||
colorizers for on-screen displays.
|
colorizers for on-screen displays.
|
||||||
|
|
||||||
To simplify token stream handling, all :ref:`operators` and :ref:`delimiters`
|
To simplify token stream handling, all :ref:`operators` and :ref:`delimiters`
|
||||||
tokens are returned using the generic :data:`token.OP` token type. The exact
|
tokens are returned using the generic :data:`~token.OP` token type. The exact
|
||||||
type can be determined by checking the ``exact_type`` property on the
|
type can be determined by checking the ``exact_type`` property on the
|
||||||
:term:`named tuple` returned from :func:`tokenize.tokenize`.
|
:term:`named tuple` returned from :func:`tokenize.tokenize`.
|
||||||
|
|
||||||
|
|
@ -44,7 +44,7 @@ The primary entry point is a :term:`generator`:
|
||||||
|
|
||||||
The returned :term:`named tuple` has an additional property named
|
The returned :term:`named tuple` has an additional property named
|
||||||
``exact_type`` that contains the exact operator type for
|
``exact_type`` that contains the exact operator type for
|
||||||
:data:`token.OP` tokens. For all other token types ``exact_type``
|
:data:`~token.OP` tokens. For all other token types ``exact_type``
|
||||||
equals the named tuple ``type`` field.
|
equals the named tuple ``type`` field.
|
||||||
|
|
||||||
.. versionchanged:: 3.1
|
.. versionchanged:: 3.1
|
||||||
|
|
@ -58,26 +58,7 @@ The primary entry point is a :term:`generator`:
|
||||||
|
|
||||||
|
|
||||||
All constants from the :mod:`token` module are also exported from
|
All constants from the :mod:`token` module are also exported from
|
||||||
:mod:`tokenize`, as are three additional token type values:
|
:mod:`tokenize`.
|
||||||
|
|
||||||
.. data:: COMMENT
|
|
||||||
|
|
||||||
Token value used to indicate a comment.
|
|
||||||
|
|
||||||
|
|
||||||
.. data:: NL
|
|
||||||
|
|
||||||
Token value used to indicate a non-terminating newline. The NEWLINE token
|
|
||||||
indicates the end of a logical line of Python code; NL tokens are generated
|
|
||||||
when a logical line of code is continued over multiple physical lines.
|
|
||||||
|
|
||||||
|
|
||||||
.. data:: ENCODING
|
|
||||||
|
|
||||||
Token value that indicates the encoding used to decode the source bytes
|
|
||||||
into text. The first token returned by :func:`.tokenize` will always be an
|
|
||||||
ENCODING token.
|
|
||||||
|
|
||||||
|
|
||||||
Another function is provided to reverse the tokenization process. This is
|
Another function is provided to reverse the tokenization process. This is
|
||||||
useful for creating tools that tokenize a script, modify the token stream, and
|
useful for creating tools that tokenize a script, modify the token stream, and
|
||||||
|
|
@ -96,8 +77,8 @@ write back the modified script.
|
||||||
token type and token string as the spacing between tokens (column
|
token type and token string as the spacing between tokens (column
|
||||||
positions) may change.
|
positions) may change.
|
||||||
|
|
||||||
It returns bytes, encoded using the ENCODING token, which is the first
|
It returns bytes, encoded using the :data:`~token.ENCODING` token, which
|
||||||
token sequence output by :func:`.tokenize`.
|
is the first token sequence output by :func:`.tokenize`.
|
||||||
|
|
||||||
|
|
||||||
:func:`.tokenize` needs to detect the encoding of source files it tokenizes. The
|
:func:`.tokenize` needs to detect the encoding of source files it tokenizes. The
|
||||||
|
|
@ -115,7 +96,7 @@ function it uses to do this is available:
|
||||||
|
|
||||||
It detects the encoding from the presence of a UTF-8 BOM or an encoding
|
It detects the encoding from the presence of a UTF-8 BOM or an encoding
|
||||||
cookie as specified in :pep:`263`. If both a BOM and a cookie are present,
|
cookie as specified in :pep:`263`. If both a BOM and a cookie are present,
|
||||||
but disagree, a SyntaxError will be raised. Note that if the BOM is found,
|
but disagree, a :exc:`SyntaxError` will be raised. Note that if the BOM is found,
|
||||||
``'utf-8-sig'`` will be returned as an encoding.
|
``'utf-8-sig'`` will be returned as an encoding.
|
||||||
|
|
||||||
If no encoding is specified, then the default of ``'utf-8'`` will be
|
If no encoding is specified, then the default of ``'utf-8'`` will be
|
||||||
|
|
@ -147,8 +128,8 @@ function it uses to do this is available:
|
||||||
3
|
3
|
||||||
|
|
||||||
Note that unclosed single-quoted strings do not cause an error to be
|
Note that unclosed single-quoted strings do not cause an error to be
|
||||||
raised. They are tokenized as ``ERRORTOKEN``, followed by the tokenization of
|
raised. They are tokenized as :data:`~token.ERRORTOKEN`, followed by the
|
||||||
their contents.
|
tokenization of their contents.
|
||||||
|
|
||||||
|
|
||||||
.. _tokenize-cli:
|
.. _tokenize-cli:
|
||||||
|
|
@ -260,7 +241,7 @@ the name of the token, and the final column is the value of the token (if any)
|
||||||
4,11-4,12: NEWLINE '\n'
|
4,11-4,12: NEWLINE '\n'
|
||||||
5,0-5,0: ENDMARKER ''
|
5,0-5,0: ENDMARKER ''
|
||||||
|
|
||||||
The exact token type names can be displayed using the ``-e`` option:
|
The exact token type names can be displayed using the :option:`-e` option:
|
||||||
|
|
||||||
.. code-block:: sh
|
.. code-block:: sh
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue