Commit graph

5 commits

Author SHA1 Message Date
zaicruvoir1rominet
ca1f81f049
Avoid raising bare Exception (#1168)
* Keep old exception messages (avoid breaking-changes for users relying on exception messages)

* Move ``get_expected_str`` out of _exceptions.py, where it does not belong, to its own file in _parser/_parsing_check.py
2025-06-07 01:53:44 -07:00
Zsolt Dollenstein
c91655fbba
fix copyright headers and add a script to check (#635) 2022-02-01 11:13:17 +00:00
Zsolt Dollenstein
c44ff0500b
Fix license headers (#560)
* Facebook -> Meta

* remove year from doc copyright
2021-12-28 11:55:18 +00:00
John Reese
10c3aa09a7
Upgrade to µsort 1.0.0rc1, and apply formatting changes (#565)
* Upgrade to usort==1.0.0rc1

* Apply sorting changes from usort 1.0.0rc1

* reapply codegen

Co-authored-by: Zsolt Dollenstein <zsol.zsol@gmail.com>
2021-12-21 14:55:04 -08:00
Zsolt Dollenstein
c02de9b718
Implement a Python PEG parser in Rust (#566)
This massive PR implements an alternative Python parser that will allow LibCST to parse Python 3.10's new grammar features. The parser is implemented in Rust, but it's turned off by default through the `LIBCST_PARSER_TYPE` environment variable. Set it to `native` to enable. The PR also enables new CI steps that test just the Rust parser, as well as steps that produce binary wheels for a variety of CPython versions and platforms.

Note: this PR aims to be roughly feature-equivalent to the main branch, so it doesn't include new 3.10 syntax features. That will be addressed as a follow-up PR.

The new parser is implemented in the `native/` directory, and is organized into two rust crates: `libcst_derive` contains some macros to facilitate various features of CST nodes, and `libcst` contains the `parser` itself (including the Python grammar), a `tokenizer` implementation by @bgw, and a very basic representation of CST `nodes`. Parsing is done by
1. **tokenizing** the input utf-8 string (bytes are not supported at the Rust layer, they are converted to utf-8 strings by the python wrapper)
2. running the **PEG parser** on the tokenized input, which also captures certain anchor tokens in the resulting syntax tree
3. using the anchor tokens to **inflate** the syntax tree into a proper CST

Co-authored-by: Benjamin Woodruff <github@benjam.info>
2021-12-21 18:14:39 +00:00