LibCST/libcst/_parser/parso/python/token.py
Zsolt Dollenstein c02de9b718
Implement a Python PEG parser in Rust (#566)
This massive PR implements an alternative Python parser that will allow LibCST to parse Python 3.10's new grammar features. The parser is implemented in Rust, but it's turned off by default through the `LIBCST_PARSER_TYPE` environment variable. Set it to `native` to enable. The PR also enables new CI steps that test just the Rust parser, as well as steps that produce binary wheels for a variety of CPython versions and platforms.

Note: this PR aims to be roughly feature-equivalent to the main branch, so it doesn't include new 3.10 syntax features. That will be addressed as a follow-up PR.

The new parser is implemented in the `native/` directory, and is organized into two rust crates: `libcst_derive` contains some macros to facilitate various features of CST nodes, and `libcst` contains the `parser` itself (including the Python grammar), a `tokenizer` implementation by @bgw, and a very basic representation of CST `nodes`. Parsing is done by
1. **tokenizing** the input utf-8 string (bytes are not supported at the Rust layer, they are converted to utf-8 strings by the python wrapper)
2. running the **PEG parser** on the tokenized input, which also captures certain anchor tokens in the resulting syntax tree
3. using the anchor tokens to **inflate** the syntax tree into a proper CST

Co-authored-by: Benjamin Woodruff <github@benjam.info>
2021-12-21 18:14:39 +00:00

34 lines
1.4 KiB
Python

# Copyright (c) Facebook, Inc. and its affiliates.
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.
try:
from libcst_native import token_type as native_token_type
TokenType = native_token_type.TokenType
class PythonTokenTypes:
STRING: TokenType = native_token_type.STRING
NUMBER: TokenType = native_token_type.NUMBER
NAME: TokenType = native_token_type.NAME
NEWLINE: TokenType = native_token_type.NEWLINE
INDENT: TokenType = native_token_type.INDENT
DEDENT: TokenType = native_token_type.DEDENT
ASYNC: TokenType = native_token_type.ASYNC
AWAIT: TokenType = native_token_type.AWAIT
FSTRING_STRING: TokenType = native_token_type.FSTRING_STRING
FSTRING_START: TokenType = native_token_type.FSTRING_START
FSTRING_END: TokenType = native_token_type.FSTRING_END
OP: TokenType = native_token_type.OP
ENDMARKER: TokenType = native_token_type.ENDMARKER
# unused dummy tokens for backwards compat with the parso tokenizer
ERRORTOKEN: TokenType = native_token_type.ERRORTOKEN
ERROR_DEDENT: TokenType = native_token_type.ERROR_DEDENT
except ImportError:
from libcst._parser.parso.python.py_token import ( # noqa F401
PythonTokenTypes,
TokenType,
)