Commit graph

264 commits

Author SHA1 Message Date
Serhiy Storchaka
73b3040f59
[3.11] gh-133767: Fix use-after-free in the unicode-escape decoder with an error handler (GH-129648) (GH-133944) (GH-134341)
If the error handler is used, a new bytes object is created to set as
the object attribute of UnicodeDecodeError, and that bytes object then
replaces the original data. A pointer to the decoded data will became invalid
after destroying that temporary bytes object. So we need other way to return
the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().

_PyBytes_DecodeEscape() does not have such issue, because it does not
use the error handlers registry, but it should be changed for compatibility
with _PyUnicode_DecodeUnicodeEscapeInternal().
(cherry picked from commit 9f69a58623)
(cherry picked from commit 6279eb8c07)
(cherry picked from commit a75953b347)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2025-06-02 17:52:52 +02:00
Miss Islington (bot)
af168df78f
[3.11] gh-109848: Make test_rot13_func in test_codecs independent (GH-109850) (GH-110505)
(cherry picked from commit b987fdb19b)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2023-10-07 13:24:27 +00:00
Miss Islington (bot)
b070d73ff2
[3.11] gh-50644: Forbid pickling of codecs streams (GH-109180) (GH-109232)
Attempts to pickle or create a shallow or deep copy of codecs streams
now raise a TypeError.

Previously, copying failed with a RecursionError, while pickling
produced wrong results that eventually caused unpickling to fail with
a RecursionError.
(cherry picked from commit d6892c2b92)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2023-09-10 17:41:19 +00:00
Serhiy Storchaka
6cd08a566f
[3.11] gh-106300: Improve assertRaises(Exception) usages in tests (GH-106302). (GH-106545)
(cherry picked from commit 6e6a4cd523)

Co-authored-by: Nikita Sobolev <mail@sobolevn.me>
2023-07-08 08:22:33 +00:00
Miss Islington (bot)
a6f6c3a3d6
[3.11] gh-98433: Fix quadratic time idna decoding. (GH-99092) (#99222)
There was an unnecessary quadratic loop in idna decoding. This restores
the behavior to linear.

(cherry picked from commit d315722564)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
2022-11-07 18:57:10 -08:00
Miss Islington (bot)
c835b97c5f
gh-51511: Note that codecs.open()'s encoding parameter affects automatic conversion to binary mode (GH-94370)
(cherry picked from commit d9407b174c)

Co-authored-by: Stanley <46876382+slateny@users.noreply.github.com>
2022-10-21 16:27:41 -07:00
Serhiy Storchaka
3483299a24
gh-81548: Deprecate octal escape sequences with value larger than 0o377 (GH-91668) 2022-04-30 13:16:27 +03:00
Nikita Sobolev
6c83c8e6b5
bpo-46198: rename duplicate tests and remove unused code (GH-30297) 2022-03-10 08:20:11 -08:00
Victor Stinner
ccbe8045fa
bpo-46659: Fix the MBCS codec alias on Windows (GH-31218) 2022-02-22 22:04:07 +01:00
Victor Stinner
04dd60e50c
bpo-46659: Update the test on the mbcs codec alias (GH-31168)
encodings registers the _alias_mbcs() codec search function before
the search_function() codec search function. Previously, the
_alias_mbcs() was never used.

Fix the test_codecs.test_mbcs_alias() test: use the current ANSI code
page, not a fake ANSI code page number.

Remove the test_site.test_aliasing_mbcs() test: the alias is now
implemented in the encodings module, no longer in the site module.
2022-02-06 21:50:09 +01:00
Victor Stinner
ea1a54506b
bpo-46303: Move fileutils.h private functions to internal C API (GH-30484)
Move almost all private functions of Include/cpython/fileutils.h to
the internal C API Include/internal/pycore_fileutils.h.

Only keep _Py_fopen_obj() in Include/cpython/fileutils.h, since it's
used by _testcapi which must not use the internal C API.

Move EncodeLocaleEx() and DecodeLocaleEx() functions from _testcapi
to _testinternalcapi, since the C API moved to the internal C API.
2022-01-11 11:56:16 +01:00
Christian Heimes
e73283a20f
bpo-45668: Fix PGO tests without test extensions (GH-29315) 2021-11-01 11:14:53 +01:00
Serhiy Storchaka
39aa98346d
bpo-45467: Fix IncrementalDecoder and StreamReader in the "raw-unicode-escape" codec (GH-28944)
They support now splitting escape sequences between input chunks.

Add the third parameter "final" in codecs.raw_unicode_escape_decode().
It is True by default to match the former behavior.
2021-10-14 20:04:19 +03:00
Serhiy Storchaka
c96d1546b1
bpo-45461: Fix IncrementalDecoder and StreamReader in the "unicode-escape" codec (GH-28939)
They support now splitting escape sequences between input chunks.

Add the third parameter "final" in codecs.unicode_escape_decode().
It is True by default to match the former behavior.
2021-10-14 13:17:00 +03:00
Victor Stinner
19ba2122ac
bpo-37330: open() no longer accept 'U' in file mode (GH-28118)
open(), io.open(), codecs.open() and fileinput.FileInput no longer
accept "U" ("universal newline") in the file mode. This flag was
deprecated since Python 3.3.
2021-09-02 12:58:00 +02:00
Max Bernstein
3635388f52
bpo-42065: Fix incorrectly formatted _codecs.charmap_decode error message (GH-19940) 2020-10-17 23:38:21 +03:00
Hai Shi
c9f696cb96
bpo-41919, test_codecs: Move codecs.register calls to setUp() (GH-22513)
* Move the codecs' (un)register operation to testcases.
* Remove _codecs._forget_codec() and _PyCodec_Forget()
2020-10-16 10:34:15 +02:00
Hai Shi
c5b049b91c
bpo-39337: encodings.normalize_encoding() now ignores non-ASCII characters (GH-22219) 2020-10-14 17:43:31 +02:00
Hai Shi
3f342376ab
bpo-39337: Add a test case for normalizing of codec names (GH-19069) 2020-10-08 21:20:57 +02:00
Hai Shi
d332e7b816
bpo-41842: Add codecs.unregister() function (GH-22360)
Add codecs.unregister() and PyCodec_Unregister() functions
to unregister a codec search function.
2020-09-28 23:41:11 +02:00
Victor Stinner
0ee0b2938c
bpo-41521: Replace whitelist/blacklist with allowlist/denylist (GH-21823)
Rename 5 test method names in test_codecs and test_typing.
2020-08-11 15:28:43 +02:00
Hai Shi
4660597b51
bpo-40275: Use new test.support helper submodules in tests (GH-21448) 2020-08-03 18:49:18 +02:00
Victor Stinner
942f7a2dea
bpo-39674: Revert "bpo-37330: open() no longer accept 'U' in file mode (GH-16959)" (GH-18767)
This reverts commit e471e72977.

The mode will be removed from Python 3.10.
2020-03-04 18:50:22 +01:00
Chris A
2565edec2c
bpo-38971: Open file in codecs.open() closes if exception raised. (GH-17666)
Open issue in the BPO indicated a desire to make the implementation of
codecs.open() at parity with io.open(), which implements a try/except to
assure file stream gets closed before an exception is raised.
2020-03-02 08:39:50 +02:00
Berker Peksag
ba22e8f174
bpo-30566: Fix IndexError when using punycode codec (GH-18632)
Trying to decode an invalid string with the punycode codec
shoud raise UnicodeError.
2020-02-25 06:19:03 +03:00
Pablo Galindo
293dd23477
Remove binding of captured exceptions when not used to reduce the chances of creating cycles (GH-17246)
Capturing exceptions into names can lead to reference cycles though the __traceback__ attribute of the exceptions in some obscure cases that have been reported previously and fixed individually. As these variables are not used anyway, we can remove the binding to reduce the chances of creating reference cycles.

See for example GH-13135
2019-11-19 21:34:03 +00:00
Victor Stinner
e471e72977
bpo-37330: open() no longer accept 'U' in file mode (GH-16959)
open(), io.open(), codecs.open() and fileinput.FileInput no longer
accept "U" ("universal newline") in the file mode. This flag was
deprecated since Python 3.3.
2019-10-28 15:40:08 +01:00
Zeth
b3b48c81f0 bpo-37876: Tests for ROT-13 codec (GH-15314)
The Rot-13 codec is for educational use but does not have unit tests,
dragging down test coverage. This adds a few very simple tests.
2019-09-09 07:50:36 -07:00
Steve Dower
7ebdda0dbe
bpo-36311: Fixes decoding multibyte characters around chunk boundaries and improves decoding performance (GH-15083) 2019-08-21 16:22:33 -07:00
Victor Stinner
8f4ef3b019
Remove unused imports in tests (GH-14518) 2019-07-01 18:28:25 +02:00
Serhiy Storchaka
894263ba80
bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (GH-14304)
* The UTF-8 incremental decoders fails now fast if encounter
  a sequence that can't be handled by the error handler.
* The UTF-16 incremental decoders with the surrogatepass error
  handler decodes now a lone low surrogate with final=False.
2019-06-25 11:54:18 +03:00
Victor Stinner
ca612a9728
bpo-36778: Remove outdated comment from CodePageTest (GH-13807)
CP65001Test has been removed.
2019-06-04 17:09:10 +02:00
Ammar Askar
a6ec1ce1ac bpo-33361: Fix bug with seeking in StreamRecoders (GH-8278) 2019-05-31 22:44:00 +03:00
Jelle Zijlstra
b3be407288 bpo-33482: fix codecs.StreamRecoder.writelines (GH-6779)
A very simple fix. I found this while writing typeshed stubs for StreamRecoder.


https://bugs.python.org/issue33482
2019-05-22 08:18:26 -07:00
Victor Stinner
d267ac20c3
bpo-36778: cp65001 encoding becomes an alias to utf_8 (GH-13230) 2019-05-10 03:19:54 +02:00
Paul Monson
62dfd7d6fe bpo-35920: Windows 10 ARM32 platform support (GH-11774) 2019-04-25 18:36:45 +00:00
Serhiy Storchaka
7a465cb5ee
bpo-24214: Fixed the UTF-8 incremental decoder. (GH-12603)
The bug occurred when the encoded surrogate character is passed
to the incremental decoder in two chunks.
2019-03-30 08:23:38 +02:00
Serhiy Storchaka
c1e2c288f4
bpo-36312: Fix decoders for some code pages. (GH-12369) 2019-03-20 21:45:18 +02:00
Inada Naoki
6a16b18224
bpo-36297: remove "unicode_internal" codec (GH-12342) 2019-03-18 15:44:11 +09:00
Serhiy Storchaka
5b10b98247
bpo-22831: Use "with" to avoid possible fd leaks in tests (part 2). (GH-10929) 2019-03-05 10:06:26 +02:00
Serhiy Storchaka
4013c17911
bpo-35372: Fix the code page decoder for input > 2 GiB. (GH-10848) 2018-12-03 10:36:45 +02:00
Victor Stinner
bde9d6bbb4
bpo-34523, bpo-35322: Fix unicode_encode_locale() (GH-10759)
Fix memory leak in PyUnicode_EncodeLocale() and
PyUnicode_EncodeFSDefault() on error handling.

Changes:

* Fix unicode_encode_locale() error handling
* Fix test_codecs.LocaleCodecTest
2018-11-28 10:26:20 +01:00
Victor Stinner
3d4226a832
bpo-34523: Support surrogatepass in locale codecs (GH-8995)
Add support for the "surrogatepass" error handler in
PyUnicode_DecodeFSDefault() and PyUnicode_EncodeFSDefault()
for the UTF-8 encoding.

Changes:

* _Py_DecodeUTF8Ex() and _Py_EncodeUTF8Ex() now support the
  surrogatepass error handler (_Py_ERROR_SURROGATEPASS).
* _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx() now use
  the _Py_error_handler enum instead of "int surrogateescape" to pass
  the error handler. These functions now return -3 if the error
  handler is unknown.
* Add unit tests on _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx()
  in test_codecs.
* Rename get_error_handler() to _Py_GetErrorHandler() and expose it
  as a private function.
* _freeze_importlib doesn't need config.filesystem_errors="strict"
  workaround anymore.
2018-08-29 22:21:32 +02:00
Zackery Spytz
e349bf2358 bpo-22602: Raise an exception in the UTF-7 decoder for ill-formed sequences starting with "+". (GH-8741)
The UTF-7 decoder now raises UnicodeDecodeError for ill-formed
sequences starting with "+" (as specified in RFC 2152).
2018-08-19 07:43:38 +03:00
Victor Stinner
91106cd9ff
bpo-29240: PEP 540: Add a new UTF-8 Mode (#855)
* Add -X utf8 command line option, PYTHONUTF8 environment variable
  and a new sys.flags.utf8_mode flag.
* If the LC_CTYPE locale is "C" at startup: enable automatically the
  UTF-8 mode.
* Add _winapi.GetACP(). encodings._alias_mbcs() now calls
  _winapi.GetACP() to get the ANSI code page
* locale.getpreferredencoding() now returns 'UTF-8' in the UTF-8
  mode. As a side effect, open() now uses the UTF-8 encoding by
  default in this mode.
* Py_DecodeLocale() and Py_EncodeLocale() now use the UTF-8 encoding
  in the UTF-8 Mode.
* Update subprocess._args_from_interpreter_flags() to handle -X utf8
* Skip some tests relying on the current locale if the UTF-8 mode is
  enabled.
* Add test_utf8mode.py.
* _Py_DecodeUTF8_surrogateescape() gets a new optional parameter to
  return also the length (number of wide characters).
* pymain_get_global_config() and pymain_set_global_config() now
  always copy flag values, rather than only copying if the new value
  is greater than the old value.
2017-12-13 12:29:09 +01:00
Serhiy Storchaka
219c2de5ad
bpo-32110: codecs.StreamReader.read(n) now returns not more than n (#4499)
characters/bytes for non-negative n.  This makes it compatible with
read() methods of other file-like objects.
2017-11-29 01:30:00 +02:00
Serhiy Storchaka
56cb465cc9 bpo-31825: Fixed OverflowError in the 'unicode-escape' codec (#4058)
and in codecs.escape_decode() when decode an escaped non-ascii byte.
2017-10-20 17:08:15 +03:00
Berker Peksag
7b4bcd2004 Issue #25270: Merge from 3.5 2016-09-16 17:32:06 +03:00
Berker Peksag
4a72a7b6c4 Issue #25270: Prevent codecs.escape_encode() from raising SystemError when an empty bytestring is passed 2016-09-16 17:31:06 +03:00
R David Murray
110b6fecbb #27364: Deprecate invalid escape strings in str/byutes.
Patch by Emanuel Barry, reviewed by Serhiy Storchaka and Martin Panter.
2016-09-08 15:34:08 -04:00