The majority of this PR is tediously passing `end_lineno` and `end_col_offset` everywhere. Here are non-trivial points:
* It is not possible to reconstruct end positions in AST "on the fly", some information is lost after an AST node is constructed, so we need two more attributes for every AST node `end_lineno` and `end_col_offset`.
* I add end position information to both CST and AST. Although it may be technically possible to avoid adding end positions to CST, the code becomes more cumbersome and less efficient.
* Since the end position is not known for non-leaf CST nodes while the next token is added, this requires a bit of extra care (see `_PyNode_FinalizeEndPos`). Unless I made some mistake, the algorithm should be linear.
* For statements, I "trim" the end position of suites to not include the terminal newlines and dedent (this seems to be what people would expect), for example in
```python
class C:
pass
pass
```
the end line and end column for the class definition is (2, 8).
* For `end_col_offset` I use the common Python convention for indexing, for example for `pass` the `end_col_offset` is 4 (not 3), so that `[0:4]` gives one the source code that corresponds to the node.
* I added a helper function `ast.get_source_segment()`, to get source text segment corresponding to a given AST node. It is also useful for testing.
An (inevitable) downside of this PR is that AST now takes almost 25% more memory. I think however it is probably justified by the benefits.
The lineno and col_offset attributes of AST nodes for list comprehensions,
generator expressions and tuples are now point to the opening parenthesis or
square brace. For tuples without parenthesis they point to the position
of the first item.
When iterating using asdl_seq_LEN(), use 'Py_ssize_t' type instead of
'int' for the iterator variable, to avoid downcast on 64-bit platforms.
_Py_asdl_int_seq_new() now also ensures that the index is greater than
or equal to 0.
Followup to 90fc8980bb.
<!--
Thanks for your contribution!
Please read this comment in its entirety. It's quite important.
# Pull Request title
It should be in the following format:
```
bpo-NNNN: Summary of the changes made
```
Where: bpo-NNNN refers to the issue number in the https://bugs.python.org.
Most PRs will require an issue number. Trivial changes, like fixing a typo, do not need an issue.
# Backport Pull Request title
If this is a backport PR (PR made against branches other than `master`),
please ensure that the PR title is in the following format:
```
[X.Y] <title from the original PR> (GH-NNNN)
```
Where: [X.Y] is the branch name, e.g. [3.6].
GH-NNNN refers to the PR number from `master`.
-->
Remove the docstring attribute of AST types and restore docstring
expression as a first stmt in their body.
Co-authored-by: INADA Naoki <methane@users.noreply.github.com>
* Add Py_UNREACHABLE() as an alias to abort().
* Use Py_UNREACHABLE() instead of assert(0)
* Convert more unreachable code to use Py_UNREACHABLE()
* Document Py_UNREACHABLE() and a few other macros.
'invalid character in identifier' now is raised instead of
'f-string: empty expression not allowed' if a subexpression contains
only whitespaces and they are not accepted by Python parser.
* bpo-29463: Add docstring field to some AST nodes.
ClassDef, ModuleDef, FunctionDef, and AsyncFunctionDef has docstring
field for now. It was first statement of there body.
* fix document. thanks travis!
* doc fixes
Issue #28691: Fix warn_invalid_escape_sequence(): handle correctly
DeprecationWarning raised as an exception. First clear the current exception to
replace the DeprecationWarning exception with a SyntaxError exception.
Unit test written by Serhiy Storchaka.