Commit graph

54 commits

Author SHA1 Message Date
Pablo Galindo Salgado
e5cf31d3c2
[3.9] bpo-46110: Add a recursion check to avoid stack overflow in the PEG parser (GH-30177) (#30215)
Co-authored-by: Batuhan Taskaya <isidentical@gmail.com>.
(cherry picked from commit e9898bf153)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
2021-12-20 17:18:13 +00:00
Victor Stinner
93a540d74c
bpo-45866: pegen strips directory of "generated from" header (GH-29777) (GH-29792) (GH-29797)
"make regen-all" now produces the same output when run from a
directory other than the source tree: when building Python out of the
source tree.

(cherry picked from commit 253b7a0a9f)
(cherry picked from commit b6defde2af)
2021-11-26 17:23:41 +01:00
Łukasz Langa
88f4ec88e2
[3.9] bpo-45494: Fix parser crash when reporting errors involving invalid continuation characters (GH-28993) (#29071)
There are two errors that this commit fixes:

* The parser was not correctly computing the offset and the string
  source for E_LINECONT errors due to the incorrect usage of strtok().
* The parser was not correctly unwinding the call stack when a tokenizer
  exception happened in rules involving optionals ('?', [...]) as we
  always make them return valid results by using the comma operator. We
  need to check first if we don't have an error before continuing..
(cherry picked from commit a106343f63)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>

NOTE: unlike the cherry-picked original, this commit points at a crazy location
due to a bug in the tokenizer that required a big refactor in 3.10 to fix.
We are leaving as-is for 3.9.
2021-10-20 18:51:13 +02:00
Łukasz Langa
4e4d35d332
[3.9] bpo-44947: Refine the syntax error for trailing commas in import statements (GH-27814) (GH-27817)
(cherry picked from commit b2f68b1900)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
2021-08-18 23:03:59 +02:00
Lysandros Nikolaou
3ce35bfbbe
[3.9] bpo-44385: Remove unused grammar rules (GH-26655) (GH-26659)
(cherry picked from commit e7b4644607)
2021-06-10 15:52:49 -07:00
Pablo Galindo
d4a9264ab8
[3.9] bpo-44168: Fix error message in the parser for keyword arguments for invalid expressions (GH-26210) (GH-26250)
(cherry picked from commit 33c0c90dea)

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
2021-05-19 19:26:59 +01:00
Lysandros Nikolaou
9a608ac17c
[3.9] bpo-40631: Disallow single parenthesized star target (GH-24027) (GH-24068)
(cherry picked from commit 2ea320dddd)

Automerge-Triggered-By: GH:pablogsal
2021-01-02 16:59:39 -08:00
Pablo Galindo
87c87b5bd6
[3.9] bpo-42381: Allow walrus in set literals and set comprehensions (GH-23332) (GH-23333)
Currently walruses are not allowerd in set literals and set comprehensions:

>>> {y := 4, 4**2, 3**3}
  File "<stdin>", line 1
    {y := 4, 4**2, 3**3}
       ^
SyntaxError: invalid syntax

but they should be allowed as well per PEP 572.
(cherry picked from commit b0aba1fcdc)

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
2020-11-18 23:44:30 +00:00
Lysandros Nikolaou
2b800ef809
bpo-42374: Allow unparenthesized walrus in genexps (GH-23319) (GH-23329)
This fixes a regression that was introduced by the new parser.

(cherry picked from commit cb3e5ed071)
2020-11-17 01:38:58 +02:00
Lysandros Nikolaou
cfcb952e30
[3.9] bpo-42218: Correctly handle errors in left-recursive rules (GH-23065) (GH-23066)
Left-recursive rules need to check for errors explicitly, since
even if the rule returns NULL, the parsing might continue and lead
to long-distance failures.

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
(cherry picked from commit 02cdfc93f8)

Automerge-Triggered-By: GH:lysnikolaou
2020-10-31 12:06:03 -07:00
Pablo Galindo
ddcd57e3ea
[3.9] bpo-42214: Fix check for NOTEQUAL token in the PEG parser for the barry_as_flufl rule (GH-23048) (GH-23051)
(cherry picked from commit 06f8c3328d)

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
2020-10-31 00:40:42 +00:00
Lysandros Nikolaou
24a7c298d4
[3.9] bpo-42123: Run the parser two times and only enable invalid rules on the second run (GH-22111) (GH-23011)
* Implement running the parser a second time for the errors messages

The first parser run is only responsible for detecting whether
there is a `SyntaxError` or not. If there isn't the AST gets returned.
Otherwise, the parser is run a second time with all the `invalid_*`
rules enabled so that all the customized error messages get produced.

(cherry picked from commit bca7014032)
2020-10-28 02:14:15 +02:00
Lysandros Nikolaou
c4b58cea47
[3.9] bpo-41659: Disallow curly brace directly after primary (GH-22996) (#23006)
(cherry picked from commit 15acc4eaba)
2020-10-28 00:38:42 +02:00
Batuhan Taskaya
42157b9eaa
[3.9] bpo-41979: Accept star-unpacking on with-item targets (GH-22611) (GH-22612)
Co-authored-by: Batuhan Taskaya <batuhanosmantaskaya@gmail.com>

Automerge-Triggered-By: @pablogsal
2020-10-09 03:31:07 -07:00
Pablo Galindo
be17295280
[3.9] bpo-41697: Correctly handle KeywordOrStarred when parsing arguments in the parser (GH-22077) (GH-22079)
(cherry picked from commit 315a61f7a9)

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
2020-09-03 16:35:17 +01:00
Pablo Galindo
8de34cdb95
[3.9] bpo-41690: Use a loop to collect args in the parser instead of recursion (GH-22053) (GH-22067)
This program can segfault the parser by stack overflow:

```
import ast

code = "f(" + ",".join(['a' for _ in range(100000)]) + ")"
print("Ready!")
ast.parse(code)
```

the reason is that the rule for arguments has a simple recursion when collecting args:

args[expr_ty]:
    [...]
    | a=named_expression b=[',' c=args { c }] {
        [...] }.
(cherry picked from commit 4a97b1517a)

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
2020-09-02 21:30:51 +01:00
Pablo Galindo
54f115dd53
[3.9] bpo-41215: Don't use NULL by default in the PEG parser keyword list (GH-21355) (GH-21356)
(cherry picked from commit 39e76c0fb0)

Co-authored-by: Pablo Galindo <pablogsal@gmail.com>

Automerge-Triggered-By: @lysnikolaou
2020-07-06 12:29:59 -07:00
Pablo Galindo
102ca529ef
[3.9] bpo-40769: Allow extra surrounding parentheses for invalid annotated assignment rule (GH-20387) (GH-21186)
(cherry picked from commit c8f29ad986)
2020-06-28 00:40:41 +01:00
Lysandros Nikolaou
d01a3e76ee
[3.9] bpo-41119: Output correct error message for list/tuple followed by colon (GH-21160) (GH-21172)
(cherry picked from commit 4b85e60601)
2020-06-27 00:14:12 +01:00
Lysandros Nikolaou
71bb921829
[3.9] bpo-41060: Avoid SEGFAULT when calling GET_INVALID_TARGET in the grammar (GH-21020) (GH-21024)
`GET_INVALID_TARGET` might unexpectedly return `NULL`, which if not
caught will cause a SEGFAULT. Therefore, this commit introduces a new
inline function `RAISE_SYNTAX_ERROR_INVALID_TARGET` that always
checks for `GET_INVALID_TARGET` returning NULL and can be used in
the grammar, replacing the long C ternary operation used till now.

(cherry picked from commit 6c4e0bd974)

Automerge-Triggered-By: @pablogsal
2020-06-20 19:47:22 -07:00
Lysandros Nikolaou
a5442b26f4
[3.9] bpo-40334: Produce better error messages on invalid targets (GH-20106) (GH-20973)
* bpo-40334: Produce better error messages on invalid targets (GH-20106)

The following error messages get produced:
- `cannot delete ...` for invalid `del` targets
- `... is an illegal 'for' target` for invalid targets in for
  statements
- `... is an illegal 'with' target` for invalid targets in
  with statements

Additionally, a few `cut`s were added in various places before the
invocation of the `invalid_*` rule, in order to speed things
up.

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
(cherry picked from commit 01ece63d42)
2020-06-19 01:03:58 +01:00
Pablo Galindo
3782497cc2
[3.9] bpo-40939: Fix test_keyword for the old parser (GH-20814) 2020-06-11 19:29:13 +01:00
Miss Islington (bot)
d55ed7b107
Raise specialised syntax error for invalid lambda parameters (GH-20776)
(cherry picked from commit c6483c9896)

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
2020-06-10 06:24:41 -07:00
Miss Islington (bot)
8df4f3942f
bpo-40903: Handle multiple '=' in invalid assignment rules in the PEG parser (GH-20697)
Automerge-Triggered-By: @pablogsal
(cherry picked from commit 9f495908c5)

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
2020-06-08 02:22:06 -07:00
Miss Islington (bot)
31084be618
bpo-40750: Do not expand the new parser debug flags if Py_BUILD_CORE is not defined (GH-20393)
(cherry picked from commit deb4355a37)

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
2020-05-25 12:37:56 -07:00
Miss Islington (bot)
82da2c3eb4
bpo-40750: Support -d flag in the new parser (GH-20340)
(cherry picked from commit 800a35c623)

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
2020-05-25 10:58:03 -07:00
Miss Islington (bot)
55c8923524
bpo-40334: Produce better error messages for non-parenthesized genexps (GH-20153)
The error message, generated for a non-parenthesized generator expression
in function calls, was still the generic `invalid syntax`, when the generator expression wasn't appearing as the first argument in the call. With this patch, even on input like `f(a, b, c for c in d, e)`, the correct error message gets produced.
(cherry picked from commit ae14583302)

Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
2020-05-21 18:14:55 -07:00
Miss Islington (bot)
d00aaf306a
bpo-40715: Reject dict unpacking on dict comprehensions (GH-20292)
Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
(cherry picked from commit b8a65ec1d3)

Co-authored-by: Batuhan Taskaya <isidentical@gmail.com>
2020-05-21 15:58:16 -07:00
Pablo Galindo
275d7e1080
[3.9] bpo-40176: Improve error messages for trailing comma on from import (GH-20294) (GH-20302)
(cherry picked from commit 72e0aa2)

Co-authored-by: Batuhan Taskaya <batuhanosmantaskaya@gmail.com>
2020-05-21 22:04:54 +01:00
Pablo Galindo
ced4e5c227
Regenerate the parser (#20195) 2020-05-18 23:47:51 +02:00
Lysandros Nikolaou
75b863aa97
bpo-40334: Reproduce error message for type comments on bare '*' in the new parser (GH-20151) 2020-05-18 20:14:47 +01:00
Lysandros Nikolaou
7b7a21bc4f
bpo-40661: Fix segfault when parsing invalid input (GH-20165)
Fix segfaults when parsing very complex invalid input, like `import äˆ ð£„¯ð¢·žð±‹á”€ð””ð‘©±å®ä±¬ð©¾\nð—¶½`.

Co-authored-by: Guido van Rossum <guido@python.org>
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
2020-05-18 18:32:03 +01:00
Lysandros Nikolaou
2c8cd06afe
bpo-40334: Improvements to error-handling code in the PEG parser (GH-20003)
The following improvements are implemented in this commit:
- `p->error_indicator` is set, in case malloc or realloc fail.
- Avoid memory leaks in the case that realloc fails.
- Call `PyErr_NoMemory()` instead of `PyErr_Format()`, because it requires no memory.

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
2020-05-17 04:19:23 +01:00
Pablo Galindo
16ab07063c
bpo-40334: Correctly identify invalid target in assignment errors (GH-20076)
Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
2020-05-15 02:04:52 +01:00
Lysandros Nikolaou
ce21cfca7b
bpo-40618: Disallow invalid targets in augassign and except clauses (GH-20083)
This commit fixes the new parser to disallow invalid targets in the
following scenarios:
- Augmented assignments must only accept a single target (Name,
  Attribute or Subscript), but no tuples or lists.
- `except` clauses should only accept a single `Name` as a target.

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
2020-05-14 21:13:50 +01:00
Lysandros Nikolaou
a15c9b3a05
bpo-40334: Always show the caret on SyntaxErrors (GH-20050)
This commit fixes SyntaxError locations when the caret is not displayed,
by doing the following:

- `col_number` always gets set to the location of the offending
  node/expr. When no caret is to be displayed, this gets achieved
  by setting the object holding the error line to None.

- Introduce a new function `_PyPegen_raise_error_known_location`,
  which can be called, when an arbitrary `lineno`/`col_offset`
  needs to be passed. This function then gets used in the grammar
  (through some new macros and inline functions) so that SyntaxError
  locations of the new parser match that of the old.
2020-05-13 20:36:27 +01:00
Shantanu
27c0d9b54a
bpo-40334: produce specialized errors for invalid del targets (GH-19911) 2020-05-11 14:53:58 -07:00
Pablo Galindo
ac7a92cc0a
bpo-40334: Avoid collisions between parser variables and grammar variables (GH-19987)
This is for the C generator:
- Disallow rule and variable names starting with `_`
- Rename most local variable names generated by the parser to start with `_`

Exceptions:
- Renaming `p` to `_p` will be a separate PR
- There are still some names that might clash, e.g.
  - anything starting with `Py`
  - C reserved words (`if` etc.)
  - Macros like `EXTRA` and `CHECK`
2020-05-09 21:34:50 -07:00
Pablo Galindo
db9163ceef
bpo-40555: Check for p->error_indicator in loop rules after the main loop is done (GH-19986) 2020-05-08 03:38:44 +01:00
Lysandros Nikolaou
4638c64295
bpo-40334: Error message for invalid default args in function call (GH-19973)
When parsing something like `f(g()=2)`, where the name of a default arg
is not a NAME, but an arbitrary expression, a specialised error message
is emitted.
2020-05-07 11:44:06 +01:00
Pablo Galindo
470aac4d8e
bpo-40334: Generate comments in the parser code to improve debugging (GH-19966) 2020-05-06 23:14:43 +01:00
Pablo Galindo
99db2a1db7
bpo-40334: Allow trailing comma in parenthesised context managers (GH-19964) 2020-05-06 22:54:34 +01:00
Lysandros Nikolaou
999ec9ab6a
bpo-40334: Add type to the assignment rule in the grammar file (GH-19963) 2020-05-06 19:11:04 +01:00
Lysandros Nikolaou
e10e7c771b
bpo-40334: Spacialized error message for invalid args after bare '*' (GH-19865)
When parsing things like `def f(*): pass` the old parser used to output `SyntaxError: named arguments must follow bare *`, which the new parser wasn't able to do.
2020-05-04 11:58:31 +01:00
Shantanu
603d354626
bpo-40493: fix function type comment parsing (GH-19894)
The grammar for func_type_input rejected things like `(*t1) ->t2`. This fixes that.

Automerge-Triggered-By: @gvanrossum
2020-05-03 22:08:14 -07:00
Guido van Rossum
3941d9700b
bpo-40334: Refactor lambda_parameters similar to parameters (GH-19830) 2020-05-01 17:42:03 +01:00
Pablo Galindo
d955241469
bpo-40334: Correct return value of func_type_comment (GH-19833) 2020-05-01 08:32:09 -07:00
Batuhan Taskaya
76c1b4d5c5
bpo-40334: Improve column offsets for thrown syntax errors by Pegen (GH-19782) 2020-05-01 14:13:43 +01:00
Pablo Galindo
b796b3fb48
bpo-40334: Simplify type handling in the PEG c_generator (GH-19818) 2020-05-01 12:32:26 +01:00
Lysandros Nikolaou
3e0a6f37df
bpo-40334: Add support for feature_version in new PEG parser (GH-19827)
`ast.parse` and `compile` support a `feature_version` parameter that
tells the parser to parse the input string, as if it were written in
an older Python version.
The `feature_version` is propagated to the tokenizer, which uses it
to handle the three different stages of support for `async` and
`await`. Additionally, it disallows the following at parser level:
- The '@' operator in < 3.5
- Async functions in < 3.5
- Async comprehensions in < 3.6
- Underscores in numeric literals in < 3.6
- Await expression in < 3.5
- Variable annotations in < 3.6
- Async for-loops in < 3.5
- Async with-statements in < 3.5
- F-strings in < 3.6

Closes we-like-parsers/cpython#124.
2020-04-30 20:27:52 -07:00