We had the definition of what makes a character "printable" documented in three places, giving two different definitions.
The definition in the comment on `_PyUnicode_IsPrintable` was inverted; correct that.
With that correction, the two definitions turn out to be equivalent -- but to confirm that, you have to go look up, or happen to know, that those are the only five "Other" categories and only three "Separator" categories in the Unicode character database. That makes it hard for the reader to tell whether they really are the same, or if there's some subtle difference in the intended semantics.
Fix that by cutting the C API docs' and the C comment's copies of the subtle details, in favor of referring to the Python-level docs. That ensures it's explicit that these are all meant to agree, and also lets us concentrate improvements to the wording in one place.
Speaking of which, borrow some ideas from the C comment, along with other tweaks, to hopefully add a bit more clarity to that one newly-centralized copy in the docs.
Also add a thorough test that the implementation agrees with this definition.
Co-authored-by: Greg Price <gnprice@gmail.com>
(cherry picked from commit 3402e133ef)
gh-127903: Fix a crash on debug builds when calling `Objects/unicodeobject::_copy_characters`` (GH-127876)
(cherry picked from commit 46cb6340d7)
Co-authored-by: Alexander Shadchin <shadchin@yandex-team.com>
Run them with different locales and different date and time.
Add the @run_with_locales() decorator to run the test with multiple
locales.
Improve the run_with_locale() context manager/decorator -- it now
catches only expected exceptions and reports the test as skipped if no
appropriate locale is available.
(cherry picked from commit 19984fe024)
gh-80527: Change support.requires_legacy_unicode_capi() (GH-108438)
The decorator now requires to be called with parenthesis:
@support.requires_legacy_unicode_capi()
instead of:
@support.requires_legacy_unicode_capi
The implementation now only imports _testcapi when the decorator is
called, so "import test.support" no longer imports the _testcapi
extension.
(cherry picked from commit 995f4c48e1)
Co-authored-by: Victor Stinner <vstinner@python.org>
The test failed on a buildbot because the pointer was only 7 hex characters. To be safe,
I bumped it down to 3: 4 in case we have 32-bit platforms, and 3 in case the pointer is very small.
An unrecognized format character in PyUnicode_FromFormat() and
PyUnicode_FromFormatV() now sets a SystemError.
In previous versions it caused all the rest of the format string to be
copied as-is to the result string, and any extra arguments discarded.
Undo rejected PEP-663 changes:
- restore `repr()` to its 3.10 status
- restore `str()` to its 3.10 status
New changes:
- `IntEnum` and `IntFlag` now leave `__str__` as the original `int.__str__` so that str() and format() return the same result
- zero-valued flags without a name have a slightly changed repr(), e.g. `repr(Color(0)) == '<Color: 0>'`
- update `dir()` for mixed-in types to return all the methods and attributes of the mixed-in type
- added `_numeric_repr_` to `Flag` to control display of unnamed values
- enums without doc strings have a more comprehensive doc string added
- `ReprEnum` added -- inheriting from this makes it so only `__repr__` is replaced, not `__str__` nor `__format__`; `IntEnum`, `IntFlag`, and `StrEnum` all inherit from `ReprEnum`
add:
* `_simple_enum` decorator to transform a normal class into an enum
* `_test_simple_enum` function to compare
* `_old_convert_` to enable checking `_convert_` generated enums
`_simple_enum` takes a normal class and converts it into an enum:
@simple_enum(Enum)
class Color:
RED = 1
GREEN = 2
BLUE = 3
`_old_convert_` works much like` _convert_` does, using the original logic:
# in a test file
import socket, enum
CheckedAddressFamily = enum._old_convert_(
enum.IntEnum, 'AddressFamily', 'socket',
lambda C: C.isupper() and C.startswith('AF_'),
source=_socket,
)
`_test_simple_enum` takes a traditional enum and a simple enum and
compares the two:
# in the REPL or the same module as Color
class CheckedColor(Enum):
RED = 1
GREEN = 2
BLUE = 3
_test_simple_enum(CheckedColor, Color)
_test_simple_enum(CheckedAddressFamily, socket.AddressFamily)
Any important differences will raise a TypeError
add:
_simple_enum decorator to transform a normal class into an enum
_test_simple_enum function to compare
_old_convert_ to enable checking _convert_ generated enums
_simple_enum takes a normal class and converts it into an enum:
@simple_enum(Enum)
class Color:
RED = 1
GREEN = 2
BLUE = 3
_old_convert_ works much like _convert_ does, using the original logic:
# in a test file
import socket, enum
CheckedAddressFamily = enum._old_convert_(
enum.IntEnum, 'AddressFamily', 'socket',
lambda C: C.isupper() and C.startswith('AF_'),
source=_socket,
)
test_simple_enum takes a traditional enum and a simple enum and
compares the two:
# in the REPL or the same module as Color
class CheckedColor(Enum):
RED = 1
GREEN = 2
BLUE = 3
_test_simple_enum(CheckedColor, Color)
_test_simple_enum(CheckedAddressFamily, socket.AddressFamily)
Any important differences will raise a TypeError
* Enum: streamline repr() and str(); improve docs
- repr() is now ``enum_class.member_name``
- stdlib global enums are ``module_name.member_name``
- str() is now ``member_name``
- add HOW-TO section for ``Enum``
- change main documentation to be an API reference
Co-authored-by: Lawrence D’Anna <lawrence_danna@apple.com>
* Add support for macOS 11 and Apple Silicon (aka arm64)
As a side effect of this work use the system copy of libffi on macOS, and remove the vendored copy
* Support building on recent versions of macOS while deploying to older versions
This allows building installers on macOS 11 while still supporting macOS 10.9.
* Use the 'p' format unit instead of manually called PyObject_IsTrue().
* Pass boolean value instead 0/1 integers to functions that needs boolean.
* Convert some arguments to boolean only once.
The documented definition was much broader than the real one:
there are tons of characters with general category "Other",
and we don't (and shouldn't) treat most of them as whitespace.
Rewrite the definition to agree with the comment on
_PyUnicode_IsWhitespace, and with the logic in makeunicodedata.py,
which is what generates that function and so ultimately governs.
Add suitable breadcrumbs so that a reader who wants to pin down
exactly what this definition means (what's a "bidirectional class"
of "B"?) can do so. The `unicodedata` module documentation is an
appropriate central place for our references to Unicode's own copious
documentation, so point there.
Also add to the isspace() test a thorough check that the
implementation agrees with the intended definition.
In development mode and in debug build, encoding and errors arguments
are now checked on string encoding and decoding operations. Examples:
open(), str.encode() and bytes.decode().
By default, for best performances, the errors argument is only
checked at the first encoding/decoding error, and the encoding
argument is sometimes ignored for empty strings.