Commit graph

411 commits

Author SHA1 Message Date
Stan Ulbrych
8a598fb623
[3.12] gh-82045: Correct and deduplicate "isprintable" docs; add test. (GH-130125)
We had the definition of what makes a character "printable" documented in three places, giving two different definitions.
The definition in the comment on `_PyUnicode_IsPrintable` was inverted; correct that.

With that correction, the two definitions turn out to be equivalent -- but to confirm that, you have to go look up, or happen to know, that those are the only five "Other" categories and only three "Separator" categories in the Unicode character database.  That makes it hard for the reader to tell whether they really are the same, or if there's some subtle difference in the intended semantics.

Fix that by cutting the C API docs' and the C comment's copies of the subtle details, in favor of referring to the Python-level docs. That ensures it's explicit that these are all meant to agree, and also lets us concentrate improvements to the wording in one place.

Speaking of which, borrow some ideas from the C comment, along with other tweaks, to hopefully add a bit more clarity to that one newly-centralized copy in the docs.

Also add a thorough test that the implementation agrees with this definition.

Co-authored-by: Greg Price <gnprice@gmail.com>
(cherry picked from commit 3402e133ef)
2025-02-17 14:07:59 +01:00
Miss Islington (bot)
e44045924d
[3.12] gh-127903: Fix a crash on debug builds when calling Objects/unicodeobject::_copy_characters (GH-127876) (#128459)
gh-127903: Fix a crash on debug builds when calling `Objects/unicodeobject::_copy_characters`` (GH-127876)
(cherry picked from commit 46cb6340d7)

Co-authored-by: Alexander Shadchin <shadchin@yandex-team.com>
2025-01-03 21:21:08 +02:00
Serhiy Storchaka
82257374b9
[3.12] gh-53203: Improve tests for strptime() (GH-125090) (GH-125093)
Run them with different locales and different date and time.

Add the @run_with_locales() decorator to run the test with multiple
locales.

Improve the run_with_locale() context manager/decorator -- it now
catches only expected exceptions and reports the test as skipped if no
appropriate locale is available.
(cherry picked from commit 19984fe024)
2024-10-08 09:47:37 +00:00
Serhiy Storchaka
ae1ea41cf3
[3.12] gh-104231: Add more tests for str(), repr(), ascii(), and bytes() (GH-112551) (GH-112555)
(cherry picked from commit 2223899adc)
2023-12-01 10:16:47 +02:00
Miss Islington (bot)
0eb6d87304
[3.12] gh-80527: Change support.requires_legacy_unicode_capi() (GH-108438) (#108446)
gh-80527: Change support.requires_legacy_unicode_capi() (GH-108438)

The decorator now requires to be called with parenthesis:

    @support.requires_legacy_unicode_capi()

instead of:

    @support.requires_legacy_unicode_capi

The implementation now only imports _testcapi when the decorator is
called, so "import test.support" no longer imports the _testcapi
extension.
(cherry picked from commit 995f4c48e1)

Co-authored-by: Victor Stinner <vstinner@python.org>
2023-08-25 18:18:24 +02:00
Nikita Sobolev
c6dac12861
Remove wrong comment about repr in test_unicode (#100495) 2022-12-24 06:48:43 -08:00
Nikita Sobolev
7ca45e5ddd
gh-94808: improve test coverage of number formatting (#99472) 2022-12-23 18:03:31 -06:00
Nikita Sobolev
745545b5bb
gh-99482: remove jython compatibility parts from stdlib and tests (#99484) 2022-12-23 14:17:24 -06:00
Serhiy Storchaka
06d4e02c3b
gh-78453: Move Unicode C API tests from test_unicode to test_capi.test_unicode (GH-99431) 2022-11-14 15:32:02 +02:00
Nikita Sobolev
d329f859b9
gh-99430: Remove duplicated tests for old-styled classes (#99432)
python 1 & 2 were a loong time ago.
2022-11-13 10:30:00 -08:00
Nikita Sobolev
b1783bc124
gh-94808: Improve coverage of unicode_find and unicode_rfind (#98648) 2022-10-25 16:37:53 -07:00
Nikita Sobolev
b7dd2cad18
gh-94808: Cover str.rsplit for UCS1, UCS2 or UCS4 (#98228) 2022-10-15 11:40:22 -07:00
Nikita Sobolev
ccab67ba79
gh-97982: Factorize PyUnicode_Count() and unicode_count() code (#98025)
Add unicode_count_impl() to factorize PyUnicode_Count()
and unicode_count() code.
2022-10-12 18:27:53 +02:00
Jelle Zijlstra
a54a69989e
gh-94808: Fix regex on exotic platforms (#98036)
The test failed on a buildbot because the pointer was only 7 hex characters. To be safe,
I bumped it down to 3: 4 in case we have 32-bit platforms, and 3 in case the pointer is very small.
2022-10-07 15:39:53 -07:00
Nikita Sobolev
72c166add8
gh-94808: Cover %p in PyUnicode_FromFormat (#96677)
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
2022-10-07 09:53:42 -07:00
Nikita Sobolev
e63d7dae90
gh-94808: Cover PyUnicode_Count in CAPI (#96929) 2022-10-06 17:20:22 +02:00
Serhiy Storchaka
62f06508e7
gh-95781: More strict format string checking in PyUnicode_FromFormatV() (GH-95784)
An unrecognized format character in PyUnicode_FromFormat() and
PyUnicode_FromFormatV() now sets a SystemError.
In previous versions it caused all the rest of the format string to be
copied as-is to the result string, and any extra arguments discarded.
2022-08-08 19:21:07 +03:00
Christian Heimes
5442561c1a
gh-93575: Use correct way to calculate PyUnicode struct sizes (GH-93602)
* gh-93575: Use correct way to calculate PyUnicode struct sizes

* Add comment to keep test_sys and test_unicode in sync

* Fix case code < 256
2022-06-08 20:18:08 +02:00
Dennis Sweeney
19a4252459
gh-92536: Update unicode struct size to ensure MemoryError is raised (GH-92867) 2022-05-17 10:12:21 -04:00
Kumar Aditya
8c54c3dacc
gh-91576: Speed up iteration of strings (#91574) 2022-04-18 07:18:27 -07:00
Dennis Sweeney
b748a36696
Use assertEqual, not assertEquals, in test_unicode (GH-31718)
Fixes a DeprecationWarning
2022-03-07 02:32:51 -05:00
Mark Shannon
03c2a36b2b
bpo-46903: Handle str-subclasses in virtual instance dictionaries. (GH-31658) 2022-03-04 11:31:29 +00:00
Kumar Aditya
83d544b929
bpo-40066: [Enum] skip failing doc test (GH-30637) 2022-01-17 07:18:13 -08:00
Victor Stinner
42a64c03ec
Revert "bpo-40066: [Enum] update str() and format() output (GH-30582)" (GH-30632)
This reverts commit acf7403f9b.
2022-01-17 13:58:40 +01:00
Ethan Furman
acf7403f9b
bpo-40066: [Enum] update str() and format() output (GH-30582)
Undo rejected PEP-663 changes:

- restore `repr()` to its 3.10 status
- restore `str()` to its 3.10 status

New changes:

- `IntEnum` and `IntFlag` now leave `__str__` as the original `int.__str__` so that str() and format() return the same result
- zero-valued flags without a name have a slightly changed repr(), e.g. `repr(Color(0)) == '<Color: 0>'`
- update `dir()` for mixed-in types to return all the methods and attributes of the mixed-in type
- added `_numeric_repr_` to `Flag` to control display of unnamed values
- enums without doc strings have a more comprehensive doc string added
- `ReprEnum` added -- inheriting from this makes it so only `__repr__` is replaced, not `__str__` nor `__format__`; `IntEnum`, `IntFlag`, and `StrEnum` all inherit from `ReprEnum`
2022-01-15 22:41:43 -08:00
Christian Heimes
e73283a20f
bpo-45668: Fix PGO tests without test extensions (GH-29315) 2021-11-01 11:14:53 +01:00
Nikita Sobolev
a2ce538e16
bpo-44891: Tests id preserving on * 1 for str and bytes (GH-27745)
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
2021-08-13 12:36:22 +02:00
Irit Katriel
4aeee0b47b
bpo-28146: Fix a confusing error message in str.format() (GH-24213)
Automerge-Triggered-By: GH:pitrou
2021-05-13 13:55:55 -07:00
Inada Naoki
9ad8f109ac
bpo-44029: Remove Py_UNICODE APIs (GH-25881)
Remove deprecated `Py_UNICODE` APIs: `PyUnicode_Encode`,
`PyUnicode_EncodeUTF7`, `PyUnicode_EncodeUTF8`,
`PyUnicode_EncodeUTF16`, `PyUnicode_EncodeUTF32`,
`PyUnicode_EncodeLatin1`, `PyUnicode_EncodeMBCS`,
`PyUnicode_EncodeDecimal`, `PyUnicode_EncodeRawUnicodeEscape`,
`PyUnicode_EncodeCharmap`, `PyUnicode_EncodeUnicodeEscape`,
`PyUnicode_TransformDecimalToASCII`, `PyUnicode_TranslateCharmap`,
`PyUnicodeEncodeError_Create`, `PyUnicodeTranslateError_Create`.

See :pep:`393` and :pep:`624` for reference.
2021-05-07 15:58:29 +09:00
Ethan Furman
a02cb474f9
bpo-38659: [Enum] add _simple_enum decorator (GH-25497)
add:

* `_simple_enum` decorator to transform a normal class into an enum
* `_test_simple_enum` function to compare
* `_old_convert_` to enable checking `_convert_` generated enums

`_simple_enum` takes a normal class and converts it into an enum:

    @simple_enum(Enum)
    class Color:
        RED = 1
        GREEN = 2
        BLUE = 3

`_old_convert_` works much like` _convert_` does, using the original logic:

    # in a test file
    import socket, enum
    CheckedAddressFamily = enum._old_convert_(
            enum.IntEnum, 'AddressFamily', 'socket',
            lambda C: C.isupper() and C.startswith('AF_'),
            source=_socket,
            )

`_test_simple_enum` takes a traditional enum and a simple enum and
compares the two:

    # in the REPL or the same module as Color
    class CheckedColor(Enum):
        RED = 1
        GREEN = 2
        BLUE = 3

    _test_simple_enum(CheckedColor, Color)

    _test_simple_enum(CheckedAddressFamily, socket.AddressFamily)

Any important differences will raise a TypeError
2021-04-21 10:20:44 -07:00
Ethan Furman
503cdc7c12
Revert "bpo-38659: [Enum] add _simple_enum decorator (GH-25285)" (GH-25476)
This reverts commit dbac8f40e8.
2021-04-19 19:12:24 -07:00
Ethan Furman
dbac8f40e8
bpo-38659: [Enum] add _simple_enum decorator (GH-25285)
add:

_simple_enum decorator to transform a normal class into an enum
_test_simple_enum function to compare
_old_convert_ to enable checking _convert_ generated enums
_simple_enum takes a normal class and converts it into an enum:

@simple_enum(Enum)
class Color:
    RED = 1
    GREEN = 2
    BLUE = 3

_old_convert_ works much like _convert_ does, using the original logic:

# in a test file
import socket, enum
CheckedAddressFamily = enum._old_convert_(
        enum.IntEnum, 'AddressFamily', 'socket',
        lambda C: C.isupper() and C.startswith('AF_'),
        source=_socket,
        )

test_simple_enum takes a traditional enum and a simple enum and
compares the two:

# in the REPL or the same module as Color
class CheckedColor(Enum):
    RED = 1
    GREEN = 2
    BLUE = 3

_test_simple_enum(CheckedColor, Color)

_test_simple_enum(CheckedAddressFamily, socket.AddressFamily)

Any important differences will raise a TypeError
2021-04-19 18:04:53 -07:00
Ethan Furman
b775106d94
bpo-40066: Enum: modify repr() and str() (GH-22392)
* Enum: streamline repr() and str(); improve docs

- repr() is now ``enum_class.member_name``
- stdlib global enums are ``module_name.member_name``
- str() is now ``member_name``
- add HOW-TO section for ``Enum``
- change main documentation to be an API reference
2021-03-30 21:17:26 -07:00
Zackery Spytz
8aabfa8550
bpo-43405: Fix DeprecationWarnings in test_unicode (GH-24754)
DeprecationWarnings were being raised in the test_encode_decimal()
and test_transform_decimal() methods after 91a639a094.
2021-03-07 15:12:35 +09:00
Inada Naoki
91a639a094
bpo-36346: Emit DeprecationWarning for PyArg_Parse() with 'u' or 'Z'. (GH-20927)
Emit DeprecationWarning when PyArg_Parse*() is called with 'u', 'Z' format.

See PEP 623.
2021-02-22 22:11:48 +09:00
Serhiy Storchaka
cf19cc3b92
bpo-27772: Make preceding width with 0 valid in string format. (GH-11270)
Previously it was an error with confusing error message.
2021-01-25 11:56:33 +02:00
Ronald Oussoren
41761933c1
bpo-41100: Support macOS 11 and Apple Silicon (GH-22855)
Co-authored-by:  Lawrence D’Anna <lawrence_danna@apple.com>

* Add support for macOS 11 and Apple Silicon (aka arm64)
   
  As a side effect of this work use the system copy of libffi on macOS, and remove the vendored copy

* Support building on recent versions of macOS while deploying to older versions

  This allows building installers on macOS 11 while still supporting macOS 10.9.
2020-11-08 10:05:27 +01:00
Hai Shi
c9f696cb96
bpo-41919, test_codecs: Move codecs.register calls to setUp() (GH-22513)
* Move the codecs' (un)register operation to testcases.
* Remove _codecs._forget_codec() and _PyCodec_Forget()
2020-10-16 10:34:15 +02:00
Serhiy Storchaka
4c8f09d7ce
bpo-36346: Make using the legacy Unicode C API optional (GH-21437)
Add compile time option USE_UNICODE_WCHAR_CACHE. Setting it to 0
makes the interpreter not using the wchar_t cache and the legacy Unicode C API.
2020-07-10 23:26:06 +03:00
Hai Shi
deb016224c
bpo-40275: Use new test.support helper submodules in tests (GH-21317) 2020-07-06 14:29:49 +02:00
Inada Naoki
038dd0f79d
bpo-36346: Raise DeprecationWarning when creating legacy Unicode (GH-20933) 2020-06-30 15:26:56 +09:00
Serhiy Storchaka
f9bab74d5b
bpo-41055: Remove outdated tests for the tp_print slot. (GH-21006) 2020-06-21 11:11:17 +03:00
Serhiy Storchaka
5650e76f63
bpo-40596: Fix str.isidentifier() for non-canonicalized strings containing non-BMP characters on Windows. (GH-20053) 2020-05-12 16:18:00 +03:00
Inada Naoki
3a8c56295d
Revert "bpo-39087: Add _PyUnicode_GetUTF8Buffer()" (GH-18985)
* Revert "bpo-39087: Add _PyUnicode_GetUTF8Buffer() (GH-17659)"

This reverts commit c7ad974d34.

* Update unicodeobject.h
2020-03-14 15:59:27 +09:00
Inada Naoki
c7ad974d34
bpo-39087: Add _PyUnicode_GetUTF8Buffer() (GH-17659)
Co-authored-by: Victor Stinner <vstinner@python.org>
2020-03-14 12:43:18 +09:00
Benjamin Peterson
51796e5d26
Update some www.unicode.org URLs to use HTTPS. (GH-18912) 2020-03-10 21:10:59 -07:00
Serhiy Storchaka
1f21eaa15e
bpo-15999: Clean up of handling boolean arguments. (GH-15610)
* Use the 'p' format unit instead of manually called PyObject_IsTrue().
* Pass boolean value instead 0/1 integers to functions that needs boolean.
* Convert some arguments to boolean only once.
2019-09-01 12:16:51 +03:00
Greg Price
6bccbe7dfb bpo-36502: Correct documentation of str.isspace() (GH-15019)
The documented definition was much broader than the real one:
there are tons of characters with general category "Other",
and we don't (and shouldn't) treat most of them as whitespace.

Rewrite the definition to agree with the comment on
_PyUnicode_IsWhitespace, and with the logic in makeunicodedata.py,
which is what generates that function and so ultimately governs.

Add suitable breadcrumbs so that a reader who wants to pin down
exactly what this definition means (what's a "bidirectional class"
of "B"?) can do so.  The `unicodedata` module documentation is an
appropriate central place for our references to Unicode's own copious
documentation, so point there.

Also add to the isspace() test a thorough check that the
implementation agrees with the intended definition.
2019-08-14 13:05:19 +02:00
Hai Shi
5623ac87bb bpo-37476: Adding tests for asutf8 and asutf8andsize (GH-14531) 2019-07-20 15:56:23 +08:00
Victor Stinner
22eb689cf3
bpo-37388: Development mode check encoding and errors (GH-14341)
In development mode and in debug build, encoding and errors arguments
are now checked on string encoding and decoding operations. Examples:
open(), str.encode() and bytes.decode().

By default, for best performances, the errors argument is only
checked at the first encoding/decoding error, and the encoding
argument is sometimes ignored for empty strings.
2019-06-26 00:51:05 +02:00