[3.14] gh-67022: Document bytes/str inconsistency in email.header.decode_header() and suggest email.headerregistry.HeaderRegistry as a sane alternative (GH-92900) (#135548)

gh-67022: Document bytes/str inconsistency in email.header.decode_header() and suggest email.headerregistry.HeaderRegistry as a sane alternative (GH-92900) * gh-67022: Document bytes/str inconsistency in email.header.decode_header() This function's possible return types have been surprising and error-prone for the entirety of its Python 3.x history. It can return either: 1. `typing.List[typing.Tuple[bytes, typing.Optional[str]]]` of length >1 2. or `typing.List[typing.Tuple[str, None]]`, of length exactly 1 This means that any user of this function must be prepared to accept either `bytes` or `str` for the first member of the 2-tuples it returns, which is a very surprising behavior in Python 3.x, particularly given that the second member of the tuple is supposed to represent the charset/encoding of the first member. This patch documents the behavior of this function, and adds test cases to demonstrate it. As discussed in bpo-22833, this cannot be changed in a backwards-compatible way, and some users of this function depend precisely on the existing behavior. Add warnings about obsolescence of 'email.header.decode_header' and 'email.header.make_header' functions. Recommend use of `email.headerregistry.HeaderRegistry` instead, as suggested in https://github.com/python/cpython/pull/92900#discussion_r1112472177 (cherry picked from commit 60181f4ed0) Co-authored-by: Dan Lenski <dlenski@gmail.com>
2025-08-04 00:48:58 +00:00 · 2025-06-15 22:02:16 +02:00 · 2025-06-15 22:02:16 +02:00 · 1c4f01a160
commit 1c4f01a160
parent 7fd8857b11
3 changed files with 54 additions and 9 deletions
--- a/Lib/email/header.py
+++ b/Lib/email/header.py
@ -59,16 +59,22 @@ _max_append = email.quoprimime._max_append
 def decode_header(header):
    """Decode a message header value without converting charset.

-    Returns a list of (string, charset) pairs containing each of the decoded
-    parts of the header.  Charset is None for non-encoded parts of the header,
-    otherwise a lower-case string containing the name of the character set
-    specified in the encoded string.
+    For historical reasons, this function may return either:
+
+    1. A list of length 1 containing a pair (str, None).
+    2. A list of (bytes, charset) pairs containing each of the decoded
+       parts of the header.  Charset is None for non-encoded parts of the header,
+       otherwise a lower-case string containing the name of the character set
+       specified in the encoded string.

    header may be a string that may or may not contain RFC2047 encoded words,
    or it may be a Header object.

    An email.errors.HeaderParseError may be raised when certain decoding error
    occurs (e.g. a base64 decoding exception).
+
+    This function exists for backwards compatibility only. For new code, we
+    recommend using email.headerregistry.HeaderRegistry instead.
    """
    # If it is a Header object, we can just return the encoded chunks.
    if hasattr(header, '_chunks'):
@ -161,6 +167,9 @@ def make_header(decoded_seq, maxlinelen=None, header_name=None,
    This function takes one of those sequence of pairs and returns a Header
    instance.  Optional maxlinelen, header_name, and continuation_ws are as in
    the Header constructor.
+
+    This function exists for backwards compatibility only, and is not
+    recommended for use in new code.
    """
    h = Header(maxlinelen=maxlinelen, header_name=header_name,
               continuation_ws=continuation_ws)