mirror of
https://github.com/python/cpython.git
synced 2025-07-07 19:35:27 +00:00
gh-67022: Document bytes/str inconsistency in email.header.decode_header() and suggest email.headerregistry.HeaderRegistry as a sane alternative (#92900)
Some checks are pending
Tests / (push) Blocked by required conditions
Tests / Windows MSI (push) Blocked by required conditions
Tests / Change detection (push) Waiting to run
Tests / Docs (push) Blocked by required conditions
Tests / Check if Autoconf files are up to date (push) Blocked by required conditions
Tests / Check if generated files are up to date (push) Blocked by required conditions
Tests / Ubuntu SSL tests with OpenSSL (push) Blocked by required conditions
Tests / WASI (push) Blocked by required conditions
Tests / Hypothesis tests on Ubuntu (push) Blocked by required conditions
Tests / Address sanitizer (push) Blocked by required conditions
Tests / Cross build Linux (push) Blocked by required conditions
Tests / CIFuzz (push) Blocked by required conditions
Tests / All required checks pass (push) Blocked by required conditions
Lint / lint (push) Waiting to run
mypy / Run mypy on Lib/_pyrepl (push) Waiting to run
mypy / Run mypy on Lib/test/libregrtest (push) Waiting to run
mypy / Run mypy on Lib/tomllib (push) Waiting to run
mypy / Run mypy on Tools/build (push) Waiting to run
mypy / Run mypy on Tools/cases_generator (push) Waiting to run
mypy / Run mypy on Tools/clinic (push) Waiting to run
mypy / Run mypy on Tools/jit (push) Waiting to run
mypy / Run mypy on Tools/peg_generator (push) Waiting to run
Some checks are pending
Tests / (push) Blocked by required conditions
Tests / Windows MSI (push) Blocked by required conditions
Tests / Change detection (push) Waiting to run
Tests / Docs (push) Blocked by required conditions
Tests / Check if Autoconf files are up to date (push) Blocked by required conditions
Tests / Check if generated files are up to date (push) Blocked by required conditions
Tests / Ubuntu SSL tests with OpenSSL (push) Blocked by required conditions
Tests / WASI (push) Blocked by required conditions
Tests / Hypothesis tests on Ubuntu (push) Blocked by required conditions
Tests / Address sanitizer (push) Blocked by required conditions
Tests / Cross build Linux (push) Blocked by required conditions
Tests / CIFuzz (push) Blocked by required conditions
Tests / All required checks pass (push) Blocked by required conditions
Lint / lint (push) Waiting to run
mypy / Run mypy on Lib/_pyrepl (push) Waiting to run
mypy / Run mypy on Lib/test/libregrtest (push) Waiting to run
mypy / Run mypy on Lib/tomllib (push) Waiting to run
mypy / Run mypy on Tools/build (push) Waiting to run
mypy / Run mypy on Tools/cases_generator (push) Waiting to run
mypy / Run mypy on Tools/clinic (push) Waiting to run
mypy / Run mypy on Tools/jit (push) Waiting to run
mypy / Run mypy on Tools/peg_generator (push) Waiting to run
* gh-67022: Document bytes/str inconsistency in email.header.decode_header() This function's possible return types have been surprising and error-prone for the entirety of its Python 3.x history. It can return either: 1. `typing.List[typing.Tuple[bytes, typing.Optional[str]]]` of length >1 2. or `typing.List[typing.Tuple[str, None]]`, of length exactly 1 This means that any user of this function must be prepared to accept either `bytes` or `str` for the first member of the 2-tuples it returns, which is a very surprising behavior in Python 3.x, particularly given that the second member of the tuple is supposed to represent the charset/encoding of the first member. This patch documents the behavior of this function, and adds test cases to demonstrate it. As discussed in bpo-22833, this cannot be changed in a backwards-compatible way, and some users of this function depend precisely on the existing behavior. Add warnings about obsolescence of 'email.header.decode_header' and 'email.header.make_header' functions. Recommend use of `email.headerregistry.HeaderRegistry` instead, as suggested in https://github.com/python/cpython/pull/92900#discussion_r1112472177
This commit is contained in:
parent
54e29ea4eb
commit
60181f4ed0
3 changed files with 54 additions and 9 deletions
|
@ -178,16 +178,36 @@ The :mod:`email.header` module also provides the following convenient functions.
|
||||||
Decode a message header value without converting the character set. The header
|
Decode a message header value without converting the character set. The header
|
||||||
value is in *header*.
|
value is in *header*.
|
||||||
|
|
||||||
This function returns a list of ``(decoded_string, charset)`` pairs containing
|
For historical reasons, this function may return either:
|
||||||
each of the decoded parts of the header. *charset* is ``None`` for non-encoded
|
|
||||||
parts of the header, otherwise a lower case string containing the name of the
|
|
||||||
character set specified in the encoded string.
|
|
||||||
|
|
||||||
Here's an example::
|
1. A list of pairs containing each of the decoded parts of the header,
|
||||||
|
``(decoded_bytes, charset)``, where *decoded_bytes* is always an instance of
|
||||||
|
:class:`bytes`, and *charset* is either:
|
||||||
|
|
||||||
|
- A lower case string containing the name of the character set specified.
|
||||||
|
|
||||||
|
- ``None`` for non-encoded parts of the header.
|
||||||
|
|
||||||
|
2. A list of length 1 containing a pair ``(string, None)``, where
|
||||||
|
*string* is always an instance of :class:`str`.
|
||||||
|
|
||||||
|
An :exc:`email.errors.HeaderParseError` may be raised when certain decoding
|
||||||
|
errors occur (e.g. a base64 decoding exception).
|
||||||
|
|
||||||
|
Here are examples:
|
||||||
|
|
||||||
>>> from email.header import decode_header
|
>>> from email.header import decode_header
|
||||||
>>> decode_header('=?iso-8859-1?q?p=F6stal?=')
|
>>> decode_header('=?iso-8859-1?q?p=F6stal?=')
|
||||||
[(b'p\xf6stal', 'iso-8859-1')]
|
[(b'p\xf6stal', 'iso-8859-1')]
|
||||||
|
>>> decode_header('unencoded_string')
|
||||||
|
[('unencoded_string', None)]
|
||||||
|
>>> decode_header('bar =?utf-8?B?ZsOzbw==?=')
|
||||||
|
[(b'bar ', None), (b'f\xc3\xb3o', 'utf-8')]
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
This function exists for for backwards compatibility only. For
|
||||||
|
new code, we recommend using :class:`email.headerregistry.HeaderRegistry`.
|
||||||
|
|
||||||
|
|
||||||
.. function:: make_header(decoded_seq, maxlinelen=None, header_name=None, continuation_ws=' ')
|
.. function:: make_header(decoded_seq, maxlinelen=None, header_name=None, continuation_ws=' ')
|
||||||
|
@ -203,3 +223,7 @@ The :mod:`email.header` module also provides the following convenient functions.
|
||||||
:class:`Header` instance. Optional *maxlinelen*, *header_name*, and
|
:class:`Header` instance. Optional *maxlinelen*, *header_name*, and
|
||||||
*continuation_ws* are as in the :class:`Header` constructor.
|
*continuation_ws* are as in the :class:`Header` constructor.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
This function exists for for backwards compatibility only, and is
|
||||||
|
not recommended for use in new code.
|
||||||
|
|
|
@ -59,16 +59,22 @@ _max_append = email.quoprimime._max_append
|
||||||
def decode_header(header):
|
def decode_header(header):
|
||||||
"""Decode a message header value without converting charset.
|
"""Decode a message header value without converting charset.
|
||||||
|
|
||||||
Returns a list of (string, charset) pairs containing each of the decoded
|
For historical reasons, this function may return either:
|
||||||
parts of the header. Charset is None for non-encoded parts of the header,
|
|
||||||
otherwise a lower-case string containing the name of the character set
|
1. A list of length 1 containing a pair (str, None).
|
||||||
specified in the encoded string.
|
2. A list of (bytes, charset) pairs containing each of the decoded
|
||||||
|
parts of the header. Charset is None for non-encoded parts of the header,
|
||||||
|
otherwise a lower-case string containing the name of the character set
|
||||||
|
specified in the encoded string.
|
||||||
|
|
||||||
header may be a string that may or may not contain RFC2047 encoded words,
|
header may be a string that may or may not contain RFC2047 encoded words,
|
||||||
or it may be a Header object.
|
or it may be a Header object.
|
||||||
|
|
||||||
An email.errors.HeaderParseError may be raised when certain decoding error
|
An email.errors.HeaderParseError may be raised when certain decoding error
|
||||||
occurs (e.g. a base64 decoding exception).
|
occurs (e.g. a base64 decoding exception).
|
||||||
|
|
||||||
|
This function exists for backwards compatibility only. For new code, we
|
||||||
|
recommend using email.headerregistry.HeaderRegistry instead.
|
||||||
"""
|
"""
|
||||||
# If it is a Header object, we can just return the encoded chunks.
|
# If it is a Header object, we can just return the encoded chunks.
|
||||||
if hasattr(header, '_chunks'):
|
if hasattr(header, '_chunks'):
|
||||||
|
@ -161,6 +167,9 @@ def make_header(decoded_seq, maxlinelen=None, header_name=None,
|
||||||
This function takes one of those sequence of pairs and returns a Header
|
This function takes one of those sequence of pairs and returns a Header
|
||||||
instance. Optional maxlinelen, header_name, and continuation_ws are as in
|
instance. Optional maxlinelen, header_name, and continuation_ws are as in
|
||||||
the Header constructor.
|
the Header constructor.
|
||||||
|
|
||||||
|
This function exists for backwards compatibility only, and is not
|
||||||
|
recommended for use in new code.
|
||||||
"""
|
"""
|
||||||
h = Header(maxlinelen=maxlinelen, header_name=header_name,
|
h = Header(maxlinelen=maxlinelen, header_name=header_name,
|
||||||
continuation_ws=continuation_ws)
|
continuation_ws=continuation_ws)
|
||||||
|
|
|
@ -2568,6 +2568,18 @@ Re: =?mac-iceland?q?r=8Aksm=9Arg=8Cs?= baz foo bar =?mac-iceland?q?r=8Aksm?=
|
||||||
self.assertEqual(str(make_header(decode_header(s))),
|
self.assertEqual(str(make_header(decode_header(s))),
|
||||||
'"Müller T" <T.Mueller@xxx.com>')
|
'"Müller T" <T.Mueller@xxx.com>')
|
||||||
|
|
||||||
|
def test_unencoded_ascii(self):
|
||||||
|
# bpo-22833/gh-67022: returns [(str, None)] rather than [(bytes, None)]
|
||||||
|
s = 'header without encoded words'
|
||||||
|
self.assertEqual(decode_header(s),
|
||||||
|
[('header without encoded words', None)])
|
||||||
|
|
||||||
|
def test_unencoded_utf8(self):
|
||||||
|
# bpo-22833/gh-67022: returns [(str, None)] rather than [(bytes, None)]
|
||||||
|
s = 'header with unexpected non ASCII caract\xe8res'
|
||||||
|
self.assertEqual(decode_header(s),
|
||||||
|
[('header with unexpected non ASCII caract\xe8res', None)])
|
||||||
|
|
||||||
|
|
||||||
# Test the MIMEMessage class
|
# Test the MIMEMessage class
|
||||||
class TestMIMEMessage(TestEmailBase):
|
class TestMIMEMessage(TestEmailBase):
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue