LibCST

mirrors/LibCST

Fork 0

mirror of https://github.com/Instagram/LibCST.git synced 2025-12-23 10:35:53 +00:00

Commit graph

Author	SHA1	Message	Date
Zsolt Dollenstein	2acc293347	Fix whitespace, fstring, walrus related parse errors (#939 , #938 , #937 , #936 , #935 , #934 , #933 , #932 , #931 ) * Allow walrus in slices See https://github.com/python/cpython/pull/23317 Raised in #930. * Fix parsing of nested f-string specifiers For an expression like `f"{one:{two:}{three}}"`, `three` is not in an f-string spec, and should be tokenized accordingly. This PR fixes the `format_spec_count` bookkeeping in the tokenizer, so it properly decrements it when a closing `}` is encountered but only if the `}` closes a format_spec. Reported in #930. * Fix tokenizing `0else` This is an obscure one. `_ if 0else _` failed to parse with some very weird errors. It turns out that the tokenizer tries to parse `0else` as a single number, but when it encounters `l` it realizes it can't be a single number and it backtracks. Unfortunately the backtracking logic was broken, and it failed to correctly backtrack one of the offsets used for whitespace parsing (the byte offset since the start of the line). This caused whitespace nodes to refer to incorrect parts of the input text, eventually resulting in the above behavior. This PR fixes the bookkeeping when the tokenizer backtracks. Reported in #930. * Allow no whitespace between lambda keyword and params in certain cases Python accepts code where `lambda` follows a ``, so this PR relaxes validation rules for Lambdas. Raised in #930. Allow any expression in comprehensions' evaluated expression This PR relaxes the accepted types for the `elt` field in `ListComp`, `SetComp`, and `GenExp`, as well as the `key` and `value` fields in `DictComp`. Fixes #500. * Allow no space around an ifexp in certain cases For example in `_ if _ else""if _ else _`. Raised in #930. Also fixes #854. * Allow no spaces after `as` in a contextmanager in certain cases Like in `with foo()as():pass` Raised in #930. * Allow no spaces around walrus in certain cases Like in `[_:=''for _ in _]` Raised in #930. * Allow no whitespace after lambda body in certain cases Like in `[lambda:()for _ in _]` Reported in #930.	2023-06-07 12:37:16 +01:00
Zsolt Dollenstein	c02de9b718	Implement a Python PEG parser in Rust (#566 ) This massive PR implements an alternative Python parser that will allow LibCST to parse Python 3.10's new grammar features. The parser is implemented in Rust, but it's turned off by default through the `LIBCST_PARSER_TYPE` environment variable. Set it to `native` to enable. The PR also enables new CI steps that test just the Rust parser, as well as steps that produce binary wheels for a variety of CPython versions and platforms. Note: this PR aims to be roughly feature-equivalent to the main branch, so it doesn't include new 3.10 syntax features. That will be addressed as a follow-up PR. The new parser is implemented in the `native/` directory, and is organized into two rust crates: `libcst_derive` contains some macros to facilitate various features of CST nodes, and `libcst` contains the `parser` itself (including the Python grammar), a `tokenizer` implementation by @bgw, and a very basic representation of CST `nodes`. Parsing is done by 1. tokenizing the input utf-8 string (bytes are not supported at the Rust layer, they are converted to utf-8 strings by the python wrapper) 2. running the PEG parser on the tokenized input, which also captures certain anchor tokens in the resulting syntax tree 3. using the anchor tokens to inflate the syntax tree into a proper CST Co-authored-by: Benjamin Woodruff <github@benjam.info>	2021-12-21 18:14:39 +00:00

Author

SHA1

Message

Date

Zsolt Dollenstein

2acc293347

Fix whitespace, fstring, walrus related parse errors (#939 , #938 , #937 , #936 , #935 , #934 , #933 , #932 , #931 )

* Allow walrus in slices

See https://github.com/python/cpython/pull/23317

Raised in #930.

* Fix parsing of nested f-string specifiers

For an expression like `f"{one:{two:}{three}}"`, `three` is not in an f-string spec, and should be tokenized accordingly.

This PR fixes the `format_spec_count` bookkeeping in the tokenizer, so it properly decrements it when a closing `}` is encountered but only if the `}` closes a format_spec.

Reported in #930.

* Fix tokenizing `0else`

This is an obscure one.

`_ if 0else _` failed to parse with some very weird errors. It turns out that the tokenizer tries to parse `0else` as a single number, but when it encounters `l` it realizes it can't be a single number and it backtracks.

Unfortunately the backtracking logic was broken, and it failed to correctly backtrack one of the offsets used for whitespace parsing (the byte offset since the start of the line). This caused whitespace nodes to refer to incorrect parts of the input text, eventually resulting in the above behavior.

This PR fixes the bookkeeping when the tokenizer backtracks.

Reported in #930.

* Allow no whitespace between lambda keyword and params in certain cases

Python accepts code where `lambda` follows a `*`, so this PR relaxes validation rules for Lambdas.

Raised in #930.

* Allow any expression in comprehensions' evaluated expression


This PR relaxes the accepted types for the `elt` field in `ListComp`, `SetComp`, and `GenExp`, as well as the `key` and `value` fields in `DictComp`.

Fixes #500.

* Allow no space around an ifexp in certain cases

For example in `_ if _ else""if _ else _`.

Raised in #930. Also fixes #854.

* Allow no spaces after `as` in a contextmanager in certain cases

Like in `with foo()as():pass`

Raised in #930.

* Allow no spaces around walrus in certain cases

Like in `[_:=''for _ in _]`

Raised in #930.

* Allow no whitespace after lambda body in certain cases

Like in `[lambda:()for _ in _]`

Reported in #930.

2023-06-07 12:37:16 +01:00

Zsolt Dollenstein

c02de9b718

Implement a Python PEG parser in Rust (#566 )

This massive PR implements an alternative Python parser that will allow LibCST to parse Python 3.10's new grammar features. The parser is implemented in Rust, but it's turned off by default through the `LIBCST_PARSER_TYPE` environment variable. Set it to `native` to enable. The PR also enables new CI steps that test just the Rust parser, as well as steps that produce binary wheels for a variety of CPython versions and platforms.

Note: this PR aims to be roughly feature-equivalent to the main branch, so it doesn't include new 3.10 syntax features. That will be addressed as a follow-up PR.

The new parser is implemented in the `native/` directory, and is organized into two rust crates: `libcst_derive` contains some macros to facilitate various features of CST nodes, and `libcst` contains the `parser` itself (including the Python grammar), a `tokenizer` implementation by @bgw, and a very basic representation of CST `nodes`. Parsing is done by
1. **tokenizing** the input utf-8 string (bytes are not supported at the Rust layer, they are converted to utf-8 strings by the python wrapper)
2. running the **PEG parser** on the tokenized input, which also captures certain anchor tokens in the resulting syntax tree
3. using the anchor tokens to **inflate** the syntax tree into a proper CST

Co-authored-by: Benjamin Woodruff <github@benjam.info>

2021-12-21 18:14:39 +00:00

2 commits