* Keep old exception messages (avoid breaking-changes for users relying on exception messages)
* Move ``get_expected_str`` out of _exceptions.py, where it does not belong, to its own file in _parser/_parsing_check.py
* Enumeration members are singletons. Copying on them would be no-op
* Avoid generating unnecessary `pass` statement
* Several trivial refactor
* Avoid building unnecessary intermediate lists, which are mere slight waste of time and space
* Remove unused import, an overlook from commit 8e6bf9e9
* `collections.abc.Mapping.get()` defaults to return `None` when key doesn't exist
* Just use unittest's `assertRaises` to specify expected exception types, instead of catching every possible `Exception`s, which could suppress legitimate errors and hide bugs
* We know for sure that the body of `CSTTypedTransformerFunctions` won't be empty, so don't bother with complex formal completeness
This massive PR implements an alternative Python parser that will allow LibCST to parse Python 3.10's new grammar features. The parser is implemented in Rust, but it's turned off by default through the `LIBCST_PARSER_TYPE` environment variable. Set it to `native` to enable. The PR also enables new CI steps that test just the Rust parser, as well as steps that produce binary wheels for a variety of CPython versions and platforms.
Note: this PR aims to be roughly feature-equivalent to the main branch, so it doesn't include new 3.10 syntax features. That will be addressed as a follow-up PR.
The new parser is implemented in the `native/` directory, and is organized into two rust crates: `libcst_derive` contains some macros to facilitate various features of CST nodes, and `libcst` contains the `parser` itself (including the Python grammar), a `tokenizer` implementation by @bgw, and a very basic representation of CST `nodes`. Parsing is done by
1. **tokenizing** the input utf-8 string (bytes are not supported at the Rust layer, they are converted to utf-8 strings by the python wrapper)
2. running the **PEG parser** on the tokenized input, which also captures certain anchor tokens in the resulting syntax tree
3. using the anchor tokens to **inflate** the syntax tree into a proper CST
Co-authored-by: Benjamin Woodruff <github@benjam.info>
which ensures we won't have inconsistent black-vs-isort errors
going forward. We can always format by running `ufmt format .`
at the root, and check with `ufmt check .` in our CI actions.
Previous behavior treated it as identical to equal, making a kwarg; it should
instead be a positional arg. Includes several tests to make sure that
whitespace handling is correct.
Fixes#416
* Read install requirements from requirements.txt
* read extras_require from requirements-dev.txt
* add requirements-dev.txt to MANIFEST.in
* apply fixes for new version of Black and Flake8
* don't upgrade Pyre
* re-format
## Summary
The pyre stub for the tokenizer module had a syntax error.
Fixing it removes other pyre errors.
## Test Plan
```
pyre check
```
Co-authored-by: Germán Méndez Bravo <kronuz@fb.com>
* Add Python 3.9 to tox envlist
* Require newer typing_extensions for 3.9
For simplicity, use the new version in all cases.
* Improve default-version selection to work on 3.9
While were at it, improve the code to work with a likely 3.10 by
allowing multiple digits for minor version.
Several of the python 2 features are gated on these in addition to
version (like `with_statement`), and a refactoring tool like Bowler
commonly needs this information anyway.
If you have such a program like "pass\\\n", this is technically a program without a trailing newline, since line continuations are defined as being a `\` followed by a newline. We were misdetecting this as having a trailing newline, thus making it impossible to parse the continuation. Add some tests to verify this behavior and then fix the problem.
Note that this was found via hypothesis.
This makes sure we always wrap elements in a SubscriptElement, even when there
is only one element. This makes things more regular while still being backwards
compatible with existing creation. The meat of this is in two halves, which can't
be split due to not wanting to break the build between commits. The first half
is just the changes to the parser and updates to tests. This includes a test to
be sure we can still render code that uses old construction types. The second half
is changes to codegen which made assumptions about `Subscript` and demonstrates
the need to make this change in the first place. This includes a fix to
`CSTNode.with_deep_changes` type to make it more correct and also more usable in
transforms without additional type assertions.
This is somewhat complicated by the fact that we need to not just allow
construction of nodes/matchers using `ExtSlice` still for backwards compatibility,
but we also need to be able to call `visit_ExtSlice` and `leave_ExtSlice` on
old visitors even though the new node is named `SubscriptElement`. The
construction/instance check/matching side of things will work since internally we
refer to everything as `SubscriptElement` and alias `ExtSlice` to this everywhere,
but for string-based function lookup, we need to get a little more clever and make
the default `visit_SubscriptElement` delegate onward to `visit_ExtSlice` so that
either form works.
This can all be removed again once we're past the deprecation period for ExtSlice.