mirror of
https://github.com/python/cpython.git
synced 2025-08-04 00:48:58 +00:00
[3.14] gh-67022: Document bytes/str inconsistency in email.header.decode_header() and suggest email.headerregistry.HeaderRegistry as a sane alternative (GH-92900) (#135548)
gh-67022: Document bytes/str inconsistency in email.header.decode_header() and suggest email.headerregistry.HeaderRegistry as a sane alternative (GH-92900)
* gh-67022: Document bytes/str inconsistency in email.header.decode_header()
This function's possible return types have been surprising and error-prone
for the entirety of its Python 3.x history. It can return either:
1. `typing.List[typing.Tuple[bytes, typing.Optional[str]]]` of length >1
2. or `typing.List[typing.Tuple[str, None]]`, of length exactly 1
This means that any user of this function must be prepared to accept either
`bytes` or `str` for the first member of the 2-tuples it returns, which is a
very surprising behavior in Python 3.x, particularly given that the second
member of the tuple is supposed to represent the charset/encoding of the
first member.
This patch documents the behavior of this function, and adds test cases
to demonstrate it.
As discussed in bpo-22833, this cannot be changed in a backwards-compatible
way, and some users of this function depend precisely on the existing
behavior.
Add warnings about obsolescence of 'email.header.decode_header' and 'email.header.make_header' functions.
Recommend use of `email.headerregistry.HeaderRegistry` instead, as suggested
in https://github.com/python/cpython/pull/92900#discussion_r1112472177
(cherry picked from commit 60181f4ed0
)
Co-authored-by: Dan Lenski <dlenski@gmail.com>
This commit is contained in:
parent
7fd8857b11
commit
1c4f01a160
3 changed files with 54 additions and 9 deletions
|
@ -59,16 +59,22 @@ _max_append = email.quoprimime._max_append
|
|||
def decode_header(header):
|
||||
"""Decode a message header value without converting charset.
|
||||
|
||||
Returns a list of (string, charset) pairs containing each of the decoded
|
||||
parts of the header. Charset is None for non-encoded parts of the header,
|
||||
otherwise a lower-case string containing the name of the character set
|
||||
specified in the encoded string.
|
||||
For historical reasons, this function may return either:
|
||||
|
||||
1. A list of length 1 containing a pair (str, None).
|
||||
2. A list of (bytes, charset) pairs containing each of the decoded
|
||||
parts of the header. Charset is None for non-encoded parts of the header,
|
||||
otherwise a lower-case string containing the name of the character set
|
||||
specified in the encoded string.
|
||||
|
||||
header may be a string that may or may not contain RFC2047 encoded words,
|
||||
or it may be a Header object.
|
||||
|
||||
An email.errors.HeaderParseError may be raised when certain decoding error
|
||||
occurs (e.g. a base64 decoding exception).
|
||||
|
||||
This function exists for backwards compatibility only. For new code, we
|
||||
recommend using email.headerregistry.HeaderRegistry instead.
|
||||
"""
|
||||
# If it is a Header object, we can just return the encoded chunks.
|
||||
if hasattr(header, '_chunks'):
|
||||
|
@ -161,6 +167,9 @@ def make_header(decoded_seq, maxlinelen=None, header_name=None,
|
|||
This function takes one of those sequence of pairs and returns a Header
|
||||
instance. Optional maxlinelen, header_name, and continuation_ws are as in
|
||||
the Header constructor.
|
||||
|
||||
This function exists for backwards compatibility only, and is not
|
||||
recommended for use in new code.
|
||||
"""
|
||||
h = Header(maxlinelen=maxlinelen, header_name=header_name,
|
||||
continuation_ws=continuation_ws)
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue