Commit graph

1085 commits

Author SHA1 Message Date
Miss Islington (bot)
f2d7fa8839
gh-96678: Fix UB of null pointer arithmetic (GH-96782)
Automerge-Triggered-By: GH:pablogsal
(cherry picked from commit 81e36f350b)

Co-authored-by: Matthias Görgens <matthias.goergens@gmail.com>
2022-09-13 08:03:40 -07:00
Miss Islington (bot)
ffafa9b91d
gh-96268: Fix loading invalid UTF-8 (GH-96270)
This makes tokenizer.c:valid_utf8 match stringlib/codecs.h:decode_utf8.

It also fixes an off-by-one error introduced in 3.10 for the line number when the tokenizer reports bad UTF8.
(cherry picked from commit 8bc356a7dd)

Co-authored-by: Michael Droettboom <mdboom@gmail.com>
2022-09-07 14:49:17 -07:00
Miss Islington (bot)
bb0dab5c48
gh-96611: Fix error message for invalid UTF-8 in mid-multiline string (GH-96623)
(cherry picked from commit 05692c67c5)

Co-authored-by: Michael Droettboom <mdboom@gmail.com>
2022-09-06 16:40:17 -07:00
Gregory P. Smith
f8b71da9aa
[3.11] gh-95778: CVE-2020-10735: Prevent DoS by very large int() (#96500)
Integer to and from text conversions via CPython's bignum `int` type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds.

This PR comes fresh from a pile of work done in our private PSRT security response team repo.

This backports https://github.com/python/cpython/pull/96499 aka 511ca94520

Signed-off-by: Christian Heimes [Red Hat] <christian@python.org>
Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org>
Reviews via the private PSRT repo via many others (see the NEWS entry in the PR).

<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->

I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#).
2022-09-02 09:48:57 -07:00
Shantanu
7fc8221794
[3.11] gh-94996: Disallow lambda pos only params with feature_version < (3, 8) (GH-95934) (GH-95936)
(cherry picked from commit a965db37f2)

Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>

Automerge-Triggered-By: GH:lysnikolaou
2022-08-12 12:41:09 -07:00
Miss Islington (bot)
4abf84602f
gh-94996: Disallow parsing pos only params with feature_version < (3, 8) (GH-94997)
(cherry picked from commit b5e3ea2862)

Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
2022-08-12 10:53:09 -07:00
Miss Islington (bot)
1221e8c400
gh-95876: Fix format string in pegen error location code (GH-95877)
(cherry picked from commit b4c857d0fd)

Co-authored-by: Christian Heimes <christian@python.org>
2022-08-11 02:19:20 -07:00
Miss Islington (bot)
d3cc99bdce
gh-95355: Check tokens[0] after allocating memory (GH-95356)
GH-95355

Automerge-Triggered-By: GH:pablogsal
(cherry picked from commit b946f529ef)

Co-authored-by: Honglin Zhu <zhuhonglin.zhl@alibaba-inc.com>
2022-07-28 03:29:50 -07:00
Miss Islington (bot)
86eb500068
[3.11] gh-95185: Check recursion depth in the AST constructor (GH-95186) (GH-95208)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
(cherry picked from commit 0047447294)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
2022-07-26 12:19:22 +02:00
Miss Islington (bot)
7733aa048e
gh-94949: Disallow parsing parenthesised ctx mgr with old feature_version (GH-94950)
* gh-94949: Disallow parsing parenthesised ctx manager with old feature_version

* 📜🤖 Added by blurb_it.

* Allow it with feature_version=(3, 9) as well

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
(cherry picked from commit 0daba82221)

Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
2022-07-18 14:57:45 -07:00
Miss Islington (bot)
7dc236d116
gh-94947: Disallow parsing walrus with feature_version < (3, 8) (GH-94948)
* gh-94947: Disallow parsing walrus with feature_version < (3, 8)

* oops, commit the parser

* 📜🤖 Added by blurb_it.

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
(cherry picked from commit ae0be5a53b)

Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
2022-07-18 02:46:21 -07:00
Miss Islington (bot)
e121cb5814
gh-94869: Fix the location in some expressions for multi-line f-string ast nodes (GH-94895)
(cherry picked from commit 2e9da8e352)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
2022-07-16 12:16:51 -07:00
Miss Islington (bot)
d49c99f10d
gh-94360: Fix a tokenizer crash when reading encoded files with syntax errors from stdin (GH-94386)
* gh-94360: Fix a tokenizer crash when reading encoded files with syntax errors from stdin

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>

* nitty nit

Co-authored-by: Łukasz Langa <lukasz@langa.pl>
(cherry picked from commit 36fcde61ba)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
2022-07-05 10:09:51 -07:00
Miss Islington (bot)
442dd8ffa5
gh-94192: Fix error for dictionary literals with invalid expression as value. (GH-94304)
* Fix error for dictionary literals with invalid expression as value.

* Remove trailing whitespace
(cherry picked from commit 8c237a7a71)

Co-authored-by: wookie184 <wookie1840@gmail.com>
2022-06-26 12:07:02 -07:00
Pablo Galindo Salgado
65ed8b47ee
[3.11] gh-92858: Improve error message for some suites with syntax error before ':' (GH-92894) (#94180)
(cherry picked from commit 2fc83ac3af)

Co-authored-by: wookie184 <wookie1840@gmail.com>

Co-authored-by: wookie184 <wookie1840@gmail.com>
2022-06-23 18:38:06 +01:00
Miss Islington (bot)
f9d0240db8
gh-93671: Avoid exponential backtracking in deeply nested sequence patterns in match statements (GH-93680)
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
(cherry picked from commit 53a8b17895)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
2022-06-10 09:21:04 -07:00
Miss Islington (bot)
376d53771d
gh-93418: Fix an assert when an f-string expression is followed by an '=', but no closing brace. (gh-93419) (gh-93422)
(cherry picked from commit ee70c70aa9)

Co-authored-by: Eric V. Smith <ericvsmith@users.noreply.github.com>

Co-authored-by: Eric V. Smith <ericvsmith@users.noreply.github.com>
2022-06-01 21:04:27 -04:00
Miss Islington (bot)
b425d887aa
gh-92597: Ensure that AST nodes without explicit end positions can be compiled (GH-93359)
(cherry picked from commit 705eaec28f)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
2022-05-31 16:26:16 -07:00
Miss Islington (bot)
7afccd34a6
gh-90473: Decrease recursion limit and skip tests on WASI (GH-92803)
(cherry picked from commit 137fd3d88a)

Co-authored-by: Christian Heimes <christian@python.org>
2022-05-19 08:05:52 -07:00
Victor Stinner
d716a0dfe2
Use static inline function Py_EnterRecursiveCall() (#91988)
Currently, calling Py_EnterRecursiveCall() and
Py_LeaveRecursiveCall() may use a function call or a static inline
function call, depending if the internal pycore_ceval.h header file
is included or not. Use a different name for the static inline
function to ensure that the static inline function is always used in
Python internals for best performance. Similar approach than
PyThreadState_GET() (function call) and _PyThreadState_GET() (static
inline function).

* Rename _Py_EnterRecursiveCall() to _Py_EnterRecursiveCallTstate()
* Rename _Py_LeaveRecursiveCall() to _Py_LeaveRecursiveCallTstate()
* pycore_ceval.h: Rename Py_EnterRecursiveCall() to
  _Py_EnterRecursiveCall() and Py_LeaveRecursiveCall() and
  _Py_LeaveRecursiveCall()
2022-05-04 13:30:23 +02:00
Serhiy Storchaka
3483299a24
gh-81548: Deprecate octal escape sequences with value larger than 0o377 (GH-91668) 2022-04-30 13:16:27 +03:00
Serhiy Storchaka
43a8bf1ea4
gh-87999: Change warning type for numeric literal followed by keyword (GH-91980)
The warning emitted by the Python parser for a numeric literal
immediately followed by keyword has been changed from deprecation
warning to syntax warning.
2022-04-27 20:15:14 +03:00
Matthieu Dartiailh
aa0f056a00
bpo-47212: Improve error messages for un-parenthesized generator expressions (GH-32302) 2022-04-05 14:47:13 +01:00
Christian Heimes
3df0e63aab
bpo-46315: Use fopencookie only on Emscripten 3.x and newer (GH-32266) 2022-04-02 23:11:38 +02:00
Hugo van Kemenade
6881ea936e
bpo-47126: Update to canonical PEP URLs specified by PEP 676 (GH-32124) 2022-03-30 12:00:27 +01:00
Maciej Górski
7b44ade018
bpo-47129: Add more informative messages to f-string syntax errors (32127)
* Add more informative messages to f-string syntax errors

* 📜🤖 Added by blurb_it.

* Fix whitespaces

* Change error message

* Remove the 'else' statement (as sugested in review)

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
2022-03-28 17:08:36 -04:00
Matthew Rahtz
e8e737bcf6
bpo-43224: Implement PEP 646 grammar changes (GH-31018)
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
2022-03-26 09:55:35 -07:00
Pablo Galindo Salgado
26cca8067b
bpo-47117: Don't crash if we fail to decode characters when the tokenizer buffers are uninitialized (GH-32129)
Automerge-Triggered-By: GH:pablogsal
2022-03-26 09:29:02 -07:00
Christian Heimes
9b889b5bda
bpo-46315: Use fopencookie() to avoid dup() in _PyTokenizer_FindEncodingFilename (GH-32033)
WASI does not have dup() and Emscripten's emulation is slow.
2022-03-22 17:08:51 +01:00
Pablo Galindo Salgado
7d810b6a4e
bpo-46838: Syntax error improvements for function definitions (GH-31590) 2022-03-22 11:38:41 +00:00
Oleg Iarygin
13b0412223
bpo-46920: Remove code that has explainers why it was disabled (GH-31813) 2022-03-14 17:04:22 +01:00
Oleg Iarygin
a52f82baf2
bpo-46920: Remove disabled debug code added decades ago and likely unnecessary (GH-31812) 2022-03-14 17:03:21 +01:00
Serhiy Storchaka
090e5c4b94
bpo-46820: Fix a SyntaxError in a numeric literal followed by "not in" (GH-31479)
Fix parsing a numeric literal immediately (without spaces) followed by
"not in" keywords, like in "1not in x". Now the parser only emits
a warning, not a syntax error.
2022-02-22 09:51:51 +02:00
Eric V. Smith
ffd9f8ff84
bpo-46762: Fix an assert failure in f-strings where > or < is the last character if the f-string is missing a trailing right brace. (#31365) 2022-02-16 05:54:09 -05:00
Pablo Galindo Salgado
e19059ecd8
Don't print rejected tokens when using the debug flags in the parser (GH-31258) 2022-02-10 14:38:27 +00:00
Pablo Galindo Salgado
390459de6d
Allow the parser to avoid nested processing of invalid rules (GH-31252) 2022-02-10 13:12:14 +00:00
Pablo Galindo Salgado
b71dc71905
bpo-46707: Avoid potential exponential backtracking in some syntax errors (GH-31241) 2022-02-10 03:37:17 +00:00
Eric Snow
81c72044a1
bpo-46541: Replace core use of _Py_IDENTIFIER() with statically initialized global objects. (gh-30928)
We're no longer using _Py_IDENTIFIER() (or _Py_static_string()) in any core CPython code.  It is still used in a number of non-builtin stdlib modules.

The replacement is: PyUnicodeObject (not pointer) fields under _PyRuntimeState, statically initialized as part of _PyRuntime.  A new _Py_GET_GLOBAL_IDENTIFIER() macro facilitates lookup of the fields (along with _Py_GET_GLOBAL_STRING() for non-identifier strings).

https://bugs.python.org/issue46541#msg411799 explains the rationale for this change.

The core of the change is in:

* (new) Include/internal/pycore_global_strings.h - the declarations for the global strings, along with the macros
* Include/internal/pycore_runtime_init.h - added the static initializers for the global strings
* Include/internal/pycore_global_objects.h - where the struct in pycore_global_strings.h is hooked into _PyRuntimeState
* Tools/scripts/generate_global_objects.py - added generation of the global string declarations and static initializers

I've also added a --check flag to generate_global_objects.py (along with make check-global-objects) to check for unused global strings.  That check is added to the PR CI config.

The remainder of this change updates the core code to use _Py_GET_GLOBAL_IDENTIFIER() instead of _Py_IDENTIFIER() and the related _Py*Id functions (likewise for _Py_GET_GLOBAL_STRING() instead of _Py_static_string()).  This includes adding a few functions where there wasn't already an alternative to _Py*Id(), replacing the _Py_Identifier * parameter with PyObject *.

The following are not changed (yet):

* stop using _Py_IDENTIFIER() in the stdlib modules
* (maybe) get rid of _Py_IDENTIFIER(), etc. entirely -- this may not be doable as at least one package on PyPI using this (private) API
* (maybe) intern the strings during runtime init

https://bugs.python.org/issue46541
2022-02-08 13:39:07 -07:00
Pablo Galindo Salgado
69e10976b2
bpo-46521: Fix codeop to use a new partial-input mode of the parser (GH-31010) 2022-02-08 11:54:37 +00:00
Paul m. p. P
89b13042fc
bpo-14916: use specified tokenizer fd for file input (GH-31006)
@pablogsal, sorry i failed to rebase to main, so i recreated https://github.com/python/cpython/pull/22190#issuecomment-1024633392

> PyRun_InteractiveOne\*() functions allow to explicitily set fd instead of stdin.
but stdin was hardcoded in readline call.

> This patch does not fix target file for prompt unlike original bpo one : prompt fd is unrelated to tokenizer source which could be read only. It is more of a bugfix regarding the docs :  actual documentation say "prompt the user" so one would expect prompt to go on stdout not a file for both PyRun_InteractiveOne\*() and PyRun_InteractiveLoop\*().

Automerge-Triggered-By: GH:pablogsal
2022-02-01 14:33:52 -08:00
Pablo Galindo Salgado
a0efc0c196
bpo-46091: Correctly calculate indentation levels for whitespace lines with continuation characters (GH-30130) 2022-01-25 22:12:14 +00:00
Eric V. Smith
0daf72194b
bpo-46503: Prevent an assert from firing when parsing some invalid \N sequences in f-strings. (GH-30865)
* bpo-46503: Prevent an assert from firing.  Also fix one nearby tiny PEP-7 nit.

* Added blurb.
2022-01-24 21:53:27 -05:00
Pablo Galindo Salgado
650720a0cf
Fix the caret position in some syntax errors in interactive mode (GH-30718) 2022-01-20 15:34:13 +00:00
Pablo Galindo Salgado
8c2fd09f36
bpo-46339: Include clarification on assert in 'get_error_line_from_tokenizer_buffers' (#30545) 2022-01-18 11:13:00 +00:00
Pablo Galindo Salgado
cedec19be8
bpo-46339: Fix crash in the parser when computing error text for multi-line f-strings (GH-30529)
Automerge-Triggered-By: GH:pablogsal
2022-01-11 08:30:39 -08:00
Pablo Galindo Salgado
6fa8b2ceee
bpo-46237: Fix the line number of tokenizer errors inside f-strings (GH-30463) 2022-01-08 00:23:40 +00:00
Batuhan Taskaya
d382f7ee0b
bpo-46289: Make conversion of FormattedValue not optional on ASDL (GH-30467)
Automerge-Triggered-By: GH:isidentical
2022-01-07 13:05:28 -08:00
Pablo Galindo Salgado
70f415fb8b
bpo-46240: Correct the error for unclosed parentheses when the tokenizer is not finished (GH-30378) 2022-01-04 10:41:22 +00:00
Pablo Galindo Salgado
dd6c35761a
bpo-46110: Restore commit e9898bf153
This restores commit e9898bf153 .
2022-01-03 19:54:06 +00:00
Pablo Galindo Salgado
9d35dedc5e
Revert "bpo-46110: Add a recursion check to avoid stack overflow in the PEG parser (GH-30177)" (GH-30363)
This reverts commit e9898bf153 temporarily as we want to confirm if this commit is the cause of a slowdown at startup time.
2022-01-03 18:29:18 +00:00