This is useful when you are using `parse_expression` or `parse_statement`
to generate a tree from a source string that is meant to be later added
to an existing module. It allows you to more easily configure both of these
parser functions to output a tree which has the same defaults as the `Module`
that you previously parsed out.
`parse_statement` and `parse_expresssion` aren't guaranteed to round-trip
exactly due to the fact that they don't have a wrapping Module to encapsulate
miscelaneous spacing. So, update the tests to verify that the rendered code
is identical to the original code based on the AST. Also, while I'm at it,
bump up Hypothesis's maximums in order to stress LibCST more.
Hypothesis found that when we have a statement like `pass\r`, we detect that
`\r` is the default and parse the trailing newline as `Newline(None)`. However, when
we render the statement back out again, since we don't have a module, we construct
a default module which treats `Newline(None)` as a `\n` not a '\r'. So, when we are
parsing statements or expressions, disable auto-inferring the default newline and always
infer the default rendered newline (`\n`) so that rendering a statement/expression back
out behaves as expected.
There are a lot of nodes that cannot be removed or converted to maybes, such as
most of the Op tokens. It would be a bit of a lie to codegen leave_* methods
that allow these nodes to be converted, only to throw a runtime error later. So,
upgrade the codegen to allow us to see whether certain nodes are used in conjunction
with a MaybeSentinel/None, or inside a Sequence, to inform ourselves as to when to
allow MaybeSentinel or RemovalSentinel.
We want to make sure that the generated function stubs stay in sync with
the node definitions. So, make a unit test that fails if codegen generates
a different file than the existing file, so that somebody modifying code
knows they need to re-run codegen.
This was found by Hypothesis, so lets fix it! Turns out we aren't recursively
evaluating whether an expression can be used without spaces against a word
operator. That means that complex expressions such as `not...^A` fail to parse
when we really should allow such expressions.
Because the parser conversion functions may not understand their current
position in the code, we instead need to construct a `ParserSyntaxError`
without the current line and column information, filling it in later.
Originally I did this by allowing ParserSyntaxError to be partially
initialized, but this wasn't very type-safe, and required some ugly
assertions.
Instead, I now construct and raise a PartialParserSyntaxError. The
BaseParser class is responsible for catching that partial error object
and constructing a full ParserSyntaxError.
For anything that's not an internal logic error, conversion functions
should raise a ParserSyntaxError.
Internal logic errors should probably use an AssertionError or an assert
statement, but that's not as important and is out of scope for this PR.
- Make INDENT/DEDENT dummy tokens more readable instead of outputting an
empty string for them.
- Sort the "expected" list to make testing easier.
- Don't show the list of expected values if there are > 10
possibilities. The output becomes impossible to read at that point.
- Try to avoid using an empty line for the displayed contextual line of
source code. Instead, point at the line above it, or just don't output
any context.
This removes the hard-coded logic about encountered/expected, and moves
it into a separate helper method.
Line and column can now be initialized lazily. We'll use this later to
raise `ParserSyntaxError`s inside of conversion functions, backfilling
the positions inside `_base_parser`, since conversion functions don't
always have access to position information.
Adds details about how various configuration options can be used, and
updates some details to reflect reality (e.g. we don't currently infer
the python version from your execution environment).
We only support parsing code as 3.7 right now.
This raises a descriptive error message if we receive an unsupported
version number, instead of silently trying to use it with the tokenizer
and failing in potentially strange ways.