* Keep old exception messages (avoid breaking-changes for users relying on exception messages)
* Move ``get_expected_str`` out of _exceptions.py, where it does not belong, to its own file in _parser/_parsing_check.py
From 3.14 onwards, we'll get `foo | bar` instead of `typing.Union[foo, bar]` as the annotation for union types (including optional). This PR prepares the codegen script for this.
Summary:
This PR removes the `typing_extensions` and `typing_inspect` dependencies as we can now rely on the built-in `typing` module since Python 3.9.
Test Plan:
existing tests
* Fix type of evaluated_value on string
This can return bytes if the string is a bytestring, e.g.:
In [1]: import libcst as cst
In [2]: cst.parse_expression('b"foo"').evaluated_value
Out[2]: b'foo'
* Fix type errors from changed signature
* Enumeration members are singletons. Copying on them would be no-op
* Avoid generating unnecessary `pass` statement
* Several trivial refactor
* Avoid building unnecessary intermediate lists, which are mere slight waste of time and space
* Remove unused import, an overlook from commit 8e6bf9e9
* `collections.abc.Mapping.get()` defaults to return `None` when key doesn't exist
* Just use unittest's `assertRaises` to specify expected exception types, instead of catching every possible `Exception`s, which could suppress legitimate errors and hide bugs
* We know for sure that the body of `CSTTypedTransformerFunctions` won't be empty, so don't bother with complex formal completeness
This massive PR implements an alternative Python parser that will allow LibCST to parse Python 3.10's new grammar features. The parser is implemented in Rust, but it's turned off by default through the `LIBCST_PARSER_TYPE` environment variable. Set it to `native` to enable. The PR also enables new CI steps that test just the Rust parser, as well as steps that produce binary wheels for a variety of CPython versions and platforms.
Note: this PR aims to be roughly feature-equivalent to the main branch, so it doesn't include new 3.10 syntax features. That will be addressed as a follow-up PR.
The new parser is implemented in the `native/` directory, and is organized into two rust crates: `libcst_derive` contains some macros to facilitate various features of CST nodes, and `libcst` contains the `parser` itself (including the Python grammar), a `tokenizer` implementation by @bgw, and a very basic representation of CST `nodes`. Parsing is done by
1. **tokenizing** the input utf-8 string (bytes are not supported at the Rust layer, they are converted to utf-8 strings by the python wrapper)
2. running the **PEG parser** on the tokenized input, which also captures certain anchor tokens in the resulting syntax tree
3. using the anchor tokens to **inflate** the syntax tree into a proper CST
Co-authored-by: Benjamin Woodruff <github@benjam.info>
which ensures we won't have inconsistent black-vs-isort errors
going forward. We can always format by running `ufmt format .`
at the root, and check with `ufmt check .` in our CI actions.
* Add flatten_sentinal
* Add FlattenSentinal to __all__
* Fix lint errors
* Fix type errors
* Update test to use leave_Return
* Update and run codegen
* Add empty test
* Update docs
* autofix
* Read install requirements from requirements.txt
* read extras_require from requirements-dev.txt
* add requirements-dev.txt to MANIFEST.in
* apply fixes for new version of Black and Flake8
* don't upgrade Pyre
* re-format
This is especially helpful for checking qualified names of nodes against one item
or a list of items that you wish to match against. I chose to create a new matcher
instead of widening the type of `MatchMetadata` to take in either a value or
a callable. I was originally going to do the former, but having a MatchIfTrue
and a MatchMetadataIfTrue felt more orthogonal and became easier to explain than
a single MatchIfTrue that could take two types of values.
Findall does what you expect given a matcher and a tree: It returns all nodes
that exist in that tree which match the given matcher. For convenience, findall
works with regular LibCST trees as well as MetadataWrappers. It is also provided
as a helper method on the matcher transforms.
This does a few things, since it was easier to combine than separate:
- Uses type aliases instead of fully-unrolled types, making it far easier for a human to read.
- Makes codegen approximately 3x faster, which has the side effect of halving our test run times.
- Consolidate the way we use type aliases by dropping the MetadataPredicate type for now, increasing consistency.
- Widen types for AllOf/OneOf matchers to allow for MatchIfTrue since this is supported under the hood.
This results in a generated matchers file that is 1/3 the size it was, more readable by a human, and most importantly, faster to codegen, parse, format and typecheck.
This adds the ability to match on metadata for a node as if it was an attribute on the node. It also opens up the ability to match on node attributes via metadata only. It allows for metadata match specifications with similar flexibility to standard matchers.
This makes sure we always wrap elements in a SubscriptElement, even when there
is only one element. This makes things more regular while still being backwards
compatible with existing creation. The meat of this is in two halves, which can't
be split due to not wanting to break the build between commits. The first half
is just the changes to the parser and updates to tests. This includes a test to
be sure we can still render code that uses old construction types. The second half
is changes to codegen which made assumptions about `Subscript` and demonstrates
the need to make this change in the first place. This includes a fix to
`CSTNode.with_deep_changes` type to make it more correct and also more usable in
transforms without additional type assertions.
This is somewhat complicated by the fact that we need to not just allow
construction of nodes/matchers using `ExtSlice` still for backwards compatibility,
but we also need to be able to call `visit_ExtSlice` and `leave_ExtSlice` on
old visitors even though the new node is named `SubscriptElement`. The
construction/instance check/matching side of things will work since internally we
refer to everything as `SubscriptElement` and alias `ExtSlice` to this everywhere,
but for string-based function lookup, we need to get a little more clever and make
the default `visit_SubscriptElement` delegate onward to `visit_ExtSlice` so that
either form works.
This can all be removed again once we're past the deprecation period for ExtSlice.
I noticed that the typed visitors codegen was creating messy types such as Union[SingleType]. We have a clean-up for Union[SingleType] snuck into gen_matcher_classes. So, lets remove that snuck-in clean-up from gen_matcher_classes and apply it globally to all codegen. This makes its purpose a lot more obvious, while helping decouple necessary codegen from prettifying/simplifying transforms. It also cleans up typed visitors, so that's a bonus.
This makes matcher equality and hash equivalent to the way LibCST nodes behave. Not only does this make us more consistent, but it also fixes a bug where matcher decorators could not be used with a matcher that initialized a sequence type as a list.
Add a RemoveFromParent() function as a convenience to returning RemovalSentinel.REMOVE.
Introduce a `deep_remove()` on CSTNode analogous to `deep_replace()` but for removing.
There are a lot of nodes that cannot be removed or converted to maybes, such as
most of the Op tokens. It would be a bit of a lie to codegen leave_* methods
that allow these nodes to be converted, only to throw a runtime error later. So,
upgrade the codegen to allow us to see whether certain nodes are used in conjunction
with a MaybeSentinel/None, or inside a Sequence, to inform ourselves as to when to
allow MaybeSentinel or RemovalSentinel.
We want to make sure that the generated function stubs stay in sync with
the node definitions. So, make a unit test that fails if codegen generates
a different file than the existing file, so that somebody modifying code
knows they need to re-run codegen.