Commit graph

53 commits

Author SHA1 Message Date
Mark Shannon
2498c22fa0
GH-91079: Implement C stack limits using addresses, not counters. (GH-130007)
* Implement C recursion protection with limit pointers

* Remove calls to PyOS_CheckStack

* Add stack protection to parser

* Make tests more robust to low stacks

* Improve error messages for stack overflow
2025-02-19 11:44:57 +00:00
efimov-mikhail
1f9025a4e7
gh-124889: Remove redundant artificial rules in PEG parser (#124893)
Cache in C PEG-generator reworked:
we save artificial rules in cache by Node string representation as a key instead of Node object itself.
As a result total count of artificial rules in parsers.c is lowered from 283 to 170.
More natural number ordering is used for the names of artificial rules.

Auxiliary method CCallMakerVisitor._generate_artificial_rule_call is added.
Its purpose is abstracting work with artificial rules cache.

Explicit using of "is_repeat1" kwarg is added to visit_Repeat0 and visit_Repeat1 methods.
Its slightly improve code readabitily.
2024-10-03 13:58:56 +01:00
Lysandros Nikolaou
348184845a
gh-120956: Avoid comparison of int to Py_ssize_t in parser (#120959) 2024-06-24 18:13:02 +02:00
yf-yang
ace2045ea6
Fix types in pegen parser generator (GH-120720) 2024-06-19 16:12:40 +02:00
David Rubin
9b280ab0ab
gh-116988: Remove duplicates of annotated_rhs in the Grammar (#117004) 2024-04-24 18:16:06 +01:00
Brett Cannon
56e59a49ae
GH-111807: Lower the parser stack depth under WASI debug builds (#112225) 2023-11-20 13:27:33 +00:00
Dennis Sweeney
86617518c4
gh-108179: Add error message for parser stack overflows (#108256) 2023-08-22 08:41:50 +01:00
Pablo Galindo Salgado
1ef61cf71a
gh-102856: Initial implementation of PEP 701 (#102855)
Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
Co-authored-by: Batuhan Taskaya <isidentical@gmail.com>
Co-authored-by: Marta Gómez Macías <mgmacias@google.com>
Co-authored-by: sunmy2019 <59365878+sunmy2019@users.noreply.github.com>
2023-04-19 11:18:16 -05:00
Pablo Galindo Salgado
f533f216e6
gh-102416: Do not memoize incorrectly loop rules in the parser (#102467) 2023-03-06 14:41:53 +01:00
Pablo Galindo Salgado
1de4395f62
gh-101046: Fix a potential memory leak in the parser when raising MemoryError (#101051) 2023-01-16 18:45:37 +00:00
Victor Stinner
5115a16831
gh-93103: Parser uses PyConfig.parser_debug instead of Py_DebugFlag (#93106)
* Replace deprecated Py_DebugFlag with PyConfig.parser_debug in the
  parser.
* Add Parser.debug member.
* Add tok_state.debug member.
* Py_FrozenMain(): Replace Py_VerboseFlag with PyConfig.verbose.
2022-05-24 22:35:08 +02:00
Christian Heimes
137fd3d88a
gh-90473: Decrease recursion limit and skip tests on WASI (GH-92803) 2022-05-19 12:43:16 +02:00
Pablo Galindo Salgado
390459de6d
Allow the parser to avoid nested processing of invalid rules (GH-31252) 2022-02-10 13:12:14 +00:00
Pablo Galindo Salgado
dd6c35761a
bpo-46110: Restore commit e9898bf153
This restores commit e9898bf153 .
2022-01-03 19:54:06 +00:00
Pablo Galindo Salgado
9d35dedc5e
Revert "bpo-46110: Add a recursion check to avoid stack overflow in the PEG parser (GH-30177)" (GH-30363)
This reverts commit e9898bf153 temporarily as we want to confirm if this commit is the cause of a slowdown at startup time.
2022-01-03 18:29:18 +00:00
Pablo Galindo Salgado
e9898bf153
bpo-46110: Add a recursion check to avoid stack overflow in the PEG parser (GH-30177)
Co-authored-by: Batuhan Taskaya <isidentical@gmail.com>
2021-12-20 15:43:26 +00:00
Victor Stinner
253b7a0a9f
bpo-45866: pegen strips directory of "generated from" header (GH-29777)
"make regen-all" now produces the same output when run from a
directory other than the source tree: when building Python out of the
source tree.
2021-11-26 11:50:34 +01:00
Pablo Galindo Salgado
a106343f63
bpo-45494: Fix parser crash when reporting errors involving invalid continuation characters (GH-28993)
There are two errors that this commit fixes:

* The parser was not correctly computing the offset and the string
  source for E_LINECONT errors due to the incorrect usage of strtok().
* The parser was not correctly unwinding the call stack when a tokenizer
  exception happened in rules involving optionals ('?', [...]) as we
  always make them return valid results by using the comma operator. We
  need to check first if we don't have an error before continuing.
2021-10-19 21:24:12 +02:00
Christian Clauss
682aecfdeb
Fix typos in the Tools directory (GH-28769)
Like #28744 but for the Tools directory.

[skip issue] Opening a related issue is pending python/psf-infra-meta#130

Automerge-Triggered-By: GH:pablogsal
2021-10-06 10:55:16 -07:00
Pablo Galindo Salgado
b01fd533fe
Extract visitors from the grammar nodes and call makers in the peg generator (GH-28172)
Simplify the peg generator logic by extracting as much visitors as possible to disentangle the flow and separate concerns.
2021-09-05 14:58:52 +01:00
Pablo Galindo Salgado
953d27261e
Update pegen to use the latest upstream developments (GH-27586) 2021-08-12 17:37:30 +01:00
Akira Nonaka
aef1b58dc8
bpo-44345: Fix 'generated by' comment in parser.c (GH-26615) 2021-06-09 16:38:53 +02:00
Pablo Galindo
c878a97968
bpo-44180: Fix edge cases in invalid assigment rules in the parser (GH-26283)
The invalid assignment rules are very delicate since the parser can
easily raise an invalid assignment when a keyword argument is provided.
As they are very deep into the grammar tree, is very difficult to
specify in which contexts these rules can be used and in which don't.
For that, we need to use a different version of the rule that doesn't do
error checking in those situations where we don't want the rule to raise
(keyword arguments and generator expressions).

We also need to check if we are in left-recursive rule, as those can try
to eagerly advance the parser even if the parse will fail at the end of
the expression. Failing to do this allows the parser to start parsing a
call as a tuple and incorrectly identify a keyword argument as an
invalid assignment, before it realizes that it was not a tuple after all.
2021-05-21 18:34:54 +01:00
Pablo Galindo
b280248be8
bpo-43822: Improve syntax errors for missing commas (GH-25377) 2021-04-15 21:38:45 +01:00
Victor Stinner
6af528b4ab
bpo-43244: Fix test_peg_generators on Windows (GH-24913)
Don't redefine Py_DebugFlag, it's already defined in pydebug.h which
is included by Python.h
2021-03-18 09:54:13 +01:00
Jozef Grajciar
c994ffe695
bpo-11717: fix ssize_t redefinition error when targeting 32bit Windows app (GH-24479) 2021-03-01 11:18:33 +00:00
Pablo Galindo
58fb156edd
bpo-42997: Improve error message for missing : before suites (GH-24292)
* Add to the peg generator a new directive ('&&') that allows to expect
  a token and hard fail the parsing if the token is not found. This
  allows to quickly emmit syntax errors for missing tokens.

* Use the new grammar element to hard-fail if the ':' is missing before
  suites.
2021-02-02 19:54:22 +00:00
Lysandros Nikolaou
02cdfc93f8
bpo-42218: Correctly handle errors in left-recursive rules (GH-23065)
Left-recursive rules need to check for errors explicitly, since
even if the rule returns NULL, the parsing might continue and lead
to long-distance failures.

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
2020-10-31 20:31:41 +02:00
Lysandros Nikolaou
bca7014032
bpo-42123: Run the parser two times and only enable invalid rules on the second run (GH-22111)
* Implement running the parser a second time for the errors messages

The first parser run is only responsible for detecting whether
there is a `SyntaxError` or not. If there isn't the AST gets returned.
Otherwise, the parser is run a second time with all the `invalid_*`
rules enabled so that all the customized error messages get produced.
2020-10-27 00:42:04 +02:00
Pablo Galindo
a5634c4067
bpo-41746: Add type information to asdl_seq objects (GH-22223)
* Add new capability to the PEG parser to type variable assignments. For instance:
```
       | a[asdl_stmt_seq*]=';'.small_stmt+ [';'] NEWLINE { a }
```

* Add new sequence types from the asdl definition (automatically generated)
* Make `asdl_seq` type a generic aliasing pointer type.
* Create a new `asdl_generic_seq` for the generic case using `void*`.
* The old `asdl_seq_GET`/`ast_seq_SET` macros now are typed.
* New `asdl_seq_GET_UNTYPED`/`ast_seq_SET_UNTYPED` macros for dealing with generic sequences.
* Changes all possible `asdl_seq` types to use specific versions everywhere.
2020-09-16 19:42:00 +01:00
Pablo Galindo
1ac0cbca36
bpo-41215: Don't use NULL by default in the PEG parser keyword list (GH-21355)
Automerge-Triggered-By: @lysnikolaou
2020-07-06 12:31:16 -07:00
Pablo Galindo
78319e373d
Include soft keywords in keyword.py (GH-20877) 2020-06-15 03:55:15 +01:00
Pablo Galindo
404b23b85b
Fix lookahead of soft keywords in the PEG parser (GH-20436)
Automerge-Triggered-By: @gvanrossum
2020-05-26 16:15:52 -07:00
Guido van Rossum
b45af1a569
Add soft keywords (GH-20370)
These are like keywords but they only work in context; they are not reserved except when there is an exact match.

This would enable things like match statements without reserving `match` (which would be bad for the `re.match()` function and probably lots of other places).

Automerge-Triggered-By: @gvanrossum
2020-05-26 10:58:44 -07:00
Pablo Galindo
deb4355a37
bpo-40750: Do not expand the new parser debug flags if Py_BUILD_CORE is not defined (GH-20393) 2020-05-25 20:17:12 +01:00
Pablo Galindo
800a35c623
bpo-40750: Support -d flag in the new parser (GH-20340) 2020-05-25 18:38:45 +01:00
Batuhan Taskaya
cba5031510
bpo-40334: Support suppressing of multiple optional variables in Pegen (GH-20367) 2020-05-24 23:20:18 +01:00
Pablo Galindo
b831129123
Fix debug output in PEG parser generator (GH-20308) 2020-05-22 02:48:09 +01:00
Pablo Galindo
d10fef35c6
Fix typing problems reported by mypy in pegen (GH-20297) 2020-05-21 21:39:44 +01:00
Batuhan Taskaya
f50516e6a9
bpo-40334: Correctly generate C parser when assigned var is None (GH-20296)
When there are 2 negative lookaheads in the same rule, let's say `!"(" blabla "," !")"`, there will the 2 `FunctionCall`'s where assigned value is None. Currently when the `add_var` is called
the first one will be ignored but when the second lookahead's var is sent to dedupe it
will be returned as `None_1` and this won't be ignored by the declaration generator in the `visit_Alt`. This patch adds an explicit check to `add_var` to distinguish whether if there is a variable or not.
2020-05-21 20:57:52 +01:00
Lysandros Nikolaou
7b7a21bc4f
bpo-40661: Fix segfault when parsing invalid input (GH-20165)
Fix segfaults when parsing very complex invalid input, like `import äˆ ð£„¯ð¢·žð±‹á”€ð””ð‘©±å®ä±¬ð©¾\nð—¶½`.

Co-authored-by: Guido van Rossum <guido@python.org>
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
2020-05-18 18:32:03 +01:00
Lysandros Nikolaou
2c8cd06afe
bpo-40334: Improvements to error-handling code in the PEG parser (GH-20003)
The following improvements are implemented in this commit:
- `p->error_indicator` is set, in case malloc or realloc fail.
- Avoid memory leaks in the case that realloc fails.
- Call `PyErr_NoMemory()` instead of `PyErr_Format()`, because it requires no memory.

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
2020-05-17 04:19:23 +01:00
Pablo Galindo
ac7a92cc0a
bpo-40334: Avoid collisions between parser variables and grammar variables (GH-19987)
This is for the C generator:
- Disallow rule and variable names starting with `_`
- Rename most local variable names generated by the parser to start with `_`

Exceptions:
- Renaming `p` to `_p` will be a separate PR
- There are still some names that might clash, e.g.
  - anything starting with `Py`
  - C reserved words (`if` etc.)
  - Macros like `EXTRA` and `CHECK`
2020-05-09 21:34:50 -07:00
Pablo Galindo
db9163ceef
bpo-40555: Check for p->error_indicator in loop rules after the main loop is done (GH-19986) 2020-05-08 03:38:44 +01:00
Pablo Galindo
470aac4d8e
bpo-40334: Generate comments in the parser code to improve debugging (GH-19966) 2020-05-06 23:14:43 +01:00
Anthony Shaw
c95e691c90
Clean up unused imports for the peg generator module (GH-19891) 2020-05-04 03:03:05 +01:00
Pablo Galindo
7ba08ff7b4
bpo-40334: use the TOKENS file when checking dangling rules (GH-19849) 2020-05-01 23:14:12 +01:00
Pablo Galindo
b796b3fb48
bpo-40334: Simplify type handling in the PEG c_generator (GH-19818) 2020-05-01 12:32:26 +01:00
Pablo Galindo
4db245ee9d
bpo-40334: refactor and cleanup for the PEG generators (GH-19775) 2020-04-29 10:42:21 +01:00
Pablo Galindo
5b9f4988c9
bpo-40334: Refactor peg_generator to receive a Tokens file when building c code (GH-19745) 2020-04-28 13:11:55 +01:00