* [3.9] gh-133767: Fix use-after-free in the unicode-escape decoder with an error handler (GH-129648) (GH-133944)
If the error handler is used, a new bytes object is created to set as
the object attribute of UnicodeDecodeError, and that bytes object then
replaces the original data. A pointer to the decoded data will became invalid
after destroying that temporary bytes object. So we need other way to return
the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().
_PyBytes_DecodeEscape() does not have such issue, because it does not
use the error handlers registry, but it should be changed for compatibility
with _PyUnicode_DecodeUnicodeEscapeInternal().
(cherry picked from commit 9f69a58623)
(cherry picked from commit 6279eb8c07)
(cherry picked from commit a75953b347)
(cherry picked from commit 0c33e5baed)
(cherry picked from commit 8b528cacbb)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
* Correctly pre-check for int-to-str conversion (#96537)
Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =)
The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact.
The justification for the current check. The C code check is:
```c
max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10
```
In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is:
$$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$
From this it follows that
$$\frac{M}{3L} < \frac{s-1}{10}$$
hence that
$$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$
So
$$2^{L(s-1)} > 10^M.$$
But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check.
<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->
Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
Co-authored-by: Christian Heimes <christian@python.org>
Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
* bpo-46503: Prevent an assert from firing. Also fix one nearby tiny PEP-7 nit.
* Added blurb.
(cherry picked from commit 0daf72194b)
Co-authored-by: Eric V. Smith <ericvsmith@users.noreply.github.com>
Co-authored-by: Eric V. Smith <ericvsmith@users.noreply.github.com>
"make regen-all" now produces the same output when run from a
directory other than the source tree: when building Python out of the
source tree.
(cherry picked from commit 253b7a0a9f)
(cherry picked from commit b6defde2af)
There are two errors that this commit fixes:
* The parser was not correctly computing the offset and the string
source for E_LINECONT errors due to the incorrect usage of strtok().
* The parser was not correctly unwinding the call stack when a tokenizer
exception happened in rules involving optionals ('?', [...]) as we
always make them return valid results by using the comma operator. We
need to check first if we don't have an error before continuing..
(cherry picked from commit a106343f63)
Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
NOTE: unlike the cherry-picked original, this commit points at a crazy location
due to a bug in the tokenizer that required a big refactor in 3.10 to fix.
We are leaving as-is for 3.9.
They support now splitting escape sequences between input chunks.
Add the third parameter "final" in codecs.unicode_escape_decode().
It is True by default to match the former behavior.
(cherry picked from commit c96d1546b1)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
When compiling an AST object with a direct / indirect reference
cycles, on the conversion phase because of exceeding amount of
calls, a segfault was raised. This patch adds recursion guards to
places for preventing user inputs to not to crash AST but instead
raise a RecursionError..
(cherry picked from commit f3491242e4)
Co-authored-by: Batuhan Taskaya <batuhan@python.org>
Currently walruses are not allowerd in set literals and set comprehensions:
>>> {y := 4, 4**2, 3**3}
File "<stdin>", line 1
{y := 4, 4**2, 3**3}
^
SyntaxError: invalid syntax
but they should be allowed as well per PEP 572.
(cherry picked from commit b0aba1fcdc)
Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
Signed-off-by: Christian Heimes <christian@python.org>
Automerge-Triggered-By: GH:tiran
(cherry picked from commit 07f2adedf0)
Co-authored-by: Christian Heimes <christian@python.org>
Left-recursive rules need to check for errors explicitly, since
even if the rule returns NULL, the parsing might continue and lead
to long-distance failures.
Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
(cherry picked from commit 02cdfc93f8)
Automerge-Triggered-By: GH:lysnikolaou
* Implement running the parser a second time for the errors messages
The first parser run is only responsible for detecting whether
there is a `SyntaxError` or not. If there isn't the AST gets returned.
Otherwise, the parser is run a second time with all the `invalid_*`
rules enabled so that all the customized error messages get produced.
(cherry picked from commit bca7014032)
Partially revert commit ac46eb4ad6:
"bpo-38113: Update the Python-ast.c generator to PEP384 (gh-15957)".
Using a module state per module instance is causing subtle practical
problems.
For example, the Mercurial project replaces the __import__() function
to implement lazy import, whereas Python expected that "import _ast"
always return a fully initialized _ast module.
Add _PyAST_Fini() to clear the state at exit.
The _ast module has no state (set _astmodule.m_size to 0). Remove
astmodule_traverse(), astmodule_clear() and astmodule_free()
functions..
(cherry picked from commit e5fbe0cbd4)
Co-authored-by: Victor Stinner <vstinner@python.org>
This program can segfault the parser by stack overflow:
```
import ast
code = "f(" + ",".join(['a' for _ in range(100000)]) + ")"
print("Ready!")
ast.parse(code)
```
the reason is that the rule for arguments has a simple recursion when collecting args:
args[expr_ty]:
[...]
| a=named_expression b=[',' c=args { c }] {
[...] }.
(cherry picked from commit 4a97b1517a)
Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* bpo-41194: Convert _ast extension to PEP 489 (GH-21293)
Convert the _ast extension module to PEP 489 "Multiphase
initialization". Replace the global _ast state with a module state.
(cherry picked from commit b1cc6ba73a)
* bpo-41204: Fix compiler warning in ast_type_init() (GH-21307)
(cherry picked from commit 1f76453173)
incr cannot be larger than INT_MAX: downcast to int explicitly.
(cherry picked from commit bde48fd811)
Co-authored-by: Victor Stinner <vstinner@python.org>
This consolidates the handling of my_fgets return values, so that interrupts are always handled, even if they come after EOF.
I believe PyOS_StdioReadline is still buggy in that I/O errors will not result in a proper Python exception being set. However, that is a separate issue.
(cherry picked from commit a74eea238f)
Co-authored-by: Benjamin Peterson <benjamin@python.org>
This will improve the debug experience if something fails in the produced AST. Previously, errors in the produced AST can be felt much later like in the garbage collector or the compiler, making debugging them much more difficult..
(cherry picked from commit 1332226b32)
Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
GCC says
```
../cpython/Parser/string_parser.c: In function ‘fstring_find_expr’:
../cpython/Parser/string_parser.c:404:93: warning: ‘cols’ may be used uninitialized in this function [-Wmaybe-uninitialized]
404 | p2->starting_col_offset = p->tok->first_lineno == p->tok->lineno ? t->col_offset + cols : cols;
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
../cpython/Parser/string_parser.c:384:16: note: ‘cols’ was declared here
384 | int lines, cols;
| ^~~~
../cpython/Parser/string_parser.c:403:45: warning: ‘lines’ may be used uninitialized in this function [-Wmaybe-uninitialized]
403 | p2->starting_lineno = t->lineno + lines - 1;
| ~~~~~~~~~~~~~~~~~~^~~
../cpython/Parser/string_parser.c:384:9: note: ‘lines’ was declared here
384 | int lines, cols;
| ^~~~~
```
and, indeed, if `PyBytes_AsString` somehow fails, lines & cols will not be initialized.
(cherry picked from commit 2ad7e9c011)
Co-authored-by: Benjamin Peterson <benjamin@python.org>
* bpo-41194: Pass module state in Python-ast.c (GH-21284)
Rework asdl_c.py to pass the module state to functions in
Python-ast.c, instead of using astmodulestate_global.
Handle also PyState_AddModule() failure in init_types().
(cherry picked from commit 74419f0c64)
* bpo-41194: The _ast module cannot be loaded more than once (GH-21290)
Fix a crash in the _ast module: it can no longer be loaded more than
once. It now uses a global state rather than a module state.
* Move _ast module state: use a global state instead.
* Set _astmodule.m_size to -1, so the extension cannot be loaded more
than once.
(cherry picked from commit 91e1bc18bd)
This commit changes the parsing of f-string expressions with the new parser. The parser gets pre-fed with the location of the expression itself (not the f-string, which was what we were doing before). This allows us to completely skip the shifting of the AST nodes after the parsing is completed..
(cherry picked from commit 1f0f4abb11)
Prefix the error message with `fstring: `, when parsing an f-string expression throws a `SyntaxError`.
(cherry picked from commit 2e0a920e9e)
Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
`GET_INVALID_TARGET` might unexpectedly return `NULL`, which if not
caught will cause a SEGFAULT. Therefore, this commit introduces a new
inline function `RAISE_SYNTAX_ERROR_INVALID_TARGET` that always
checks for `GET_INVALID_TARGET` returning NULL and can be used in
the grammar, replacing the long C ternary operation used till now.
(cherry picked from commit 6c4e0bd974)
Automerge-Triggered-By: @pablogsal