[3.12] gh-121650: Encode newlines in headers, and verify headers are sound (GH-122233) (#122599)

* gh-121650: Encode newlines in headers, and verify headers are sound (GH-122233)

- Encode header parts that contain newlines

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

- Verify that email headers are well-formed

This should fail for custom fold() implementations that aren't careful
about newlines.

Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
(cherry picked from commit 0976339818)

* Document changes as made in 3.12.5
This commit is contained in:
Petr Viktorin 2024-08-06 19:07:19 +02:00 committed by GitHub
parent 01db0e404d
commit 4766d1200f
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
10 changed files with 168 additions and 4 deletions

View file

@ -92,6 +92,8 @@ TOKEN_ENDS = TSPECIALS | WSP
ASPECIALS = TSPECIALS | set("*'%")
ATTRIBUTE_ENDS = ASPECIALS | WSP
EXTENDED_ATTRIBUTE_ENDS = ATTRIBUTE_ENDS - set('%')
NLSET = {'\n', '\r'}
SPECIALSNL = SPECIALS | NLSET
def quote_string(value):
return '"'+str(value).replace('\\', '\\\\').replace('"', r'\"')+'"'
@ -2802,9 +2804,13 @@ def _refold_parse_tree(parse_tree, *, policy):
wrap_as_ew_blocked -= 1
continue
tstr = str(part)
if part.token_type == 'ptext' and set(tstr) & SPECIALS:
# Encode if tstr contains special characters.
want_encoding = True
if not want_encoding:
if part.token_type == 'ptext':
# Encode if tstr contains special characters.
want_encoding = not SPECIALSNL.isdisjoint(tstr)
else:
# Encode if tstr contains newlines.
want_encoding = not NLSET.isdisjoint(tstr)
try:
tstr.encode(encoding)
charset = encoding