[3.12] gh-121650: Encode newlines in headers, and verify headers are sound (GH-122233) (#122599)

* gh-121650: Encode newlines in headers, and verify headers are sound (GH-122233) - Encode header parts that contain newlines Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. - Verify that email headers are well-formed This should fail for custom fold() implementations that aren't careful about newlines. Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org> Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> (cherry picked from commit 0976339818) * Document changes as made in 3.12.5
2025-09-26 10:19:53 +00:00 · 2024-08-06 19:07:19 +02:00 · 2024-08-06 19:07:19 +02:00 · 4766d1200f
commit 4766d1200f
parent 01db0e404d
10 changed files with 168 additions and 4 deletions
--- a/Lib/email/_header_value_parser.py
+++ b/Lib/email/_header_value_parser.py
@ -92,6 +92,8 @@ TOKEN_ENDS = TSPECIALS | WSP
 ASPECIALS = TSPECIALS | set("*'%")
 ATTRIBUTE_ENDS = ASPECIALS | WSP
 EXTENDED_ATTRIBUTE_ENDS = ATTRIBUTE_ENDS - set('%')
+NLSET = {'\n', '\r'}
+SPECIALSNL = SPECIALS | NLSET

 def quote_string(value):
    return '"'+str(value).replace('\\', '\\\\').replace('"', r'\"')+'"'
@ -2802,9 +2804,13 @@ def _refold_parse_tree(parse_tree, *, policy):
            wrap_as_ew_blocked -= 1
            continue
        tstr = str(part)
-        if part.token_type == 'ptext' and set(tstr) & SPECIALS:
-            # Encode if tstr contains special characters.
-            want_encoding = True
+        if not want_encoding:
+            if part.token_type == 'ptext':
+                # Encode if tstr contains special characters.
+                want_encoding = not SPECIALSNL.isdisjoint(tstr)
+            else:
+                # Encode if tstr contains newlines.
+                want_encoding = not NLSET.isdisjoint(tstr)
        try:
            tstr.encode(encoding)
            charset = encoding