Commit graph

800 commits

Author SHA1 Message Date
Dan Lenski
60181f4ed0
gh-67022: Document bytes/str inconsistency in email.header.decode_header() and suggest email.headerregistry.HeaderRegistry as a sane alternative (#92900)
Some checks are pending
Tests / (push) Blocked by required conditions
Tests / Windows MSI (push) Blocked by required conditions
Tests / Change detection (push) Waiting to run
Tests / Docs (push) Blocked by required conditions
Tests / Check if Autoconf files are up to date (push) Blocked by required conditions
Tests / Check if generated files are up to date (push) Blocked by required conditions
Tests / Ubuntu SSL tests with OpenSSL (push) Blocked by required conditions
Tests / WASI (push) Blocked by required conditions
Tests / Hypothesis tests on Ubuntu (push) Blocked by required conditions
Tests / Address sanitizer (push) Blocked by required conditions
Tests / Cross build Linux (push) Blocked by required conditions
Tests / CIFuzz (push) Blocked by required conditions
Tests / All required checks pass (push) Blocked by required conditions
Lint / lint (push) Waiting to run
mypy / Run mypy on Lib/_pyrepl (push) Waiting to run
mypy / Run mypy on Lib/test/libregrtest (push) Waiting to run
mypy / Run mypy on Lib/tomllib (push) Waiting to run
mypy / Run mypy on Tools/build (push) Waiting to run
mypy / Run mypy on Tools/cases_generator (push) Waiting to run
mypy / Run mypy on Tools/clinic (push) Waiting to run
mypy / Run mypy on Tools/jit (push) Waiting to run
mypy / Run mypy on Tools/peg_generator (push) Waiting to run
* gh-67022: Document bytes/str inconsistency in email.header.decode_header()

This function's possible return types have been surprising and error-prone
for the entirety of its Python 3.x history. It can return either:

1. `typing.List[typing.Tuple[bytes, typing.Optional[str]]]` of length >1
2. or `typing.List[typing.Tuple[str, None]]`, of length exactly 1

This means that any user of this function must be prepared to accept either
`bytes` or `str` for the first member of the 2-tuples it returns, which is a
very surprising behavior in Python 3.x, particularly given that the second
member of the tuple is supposed to represent the charset/encoding of the
first member.

This patch documents the behavior of this function, and adds test cases
to demonstrate it.

As discussed in bpo-22833, this cannot be changed in a backwards-compatible
way, and some users of this function depend precisely on the existing
behavior.

Add warnings about obsolescence of 'email.header.decode_header' and 'email.header.make_header' functions.

Recommend use of `email.headerregistry.HeaderRegistry` instead, as suggested
in https://github.com/python/cpython/pull/92900#discussion_r1112472177
2025-06-15 15:29:38 -04:00
Alexander Shadchin
c23eec2960
Docs: fix docstring of email.message.Message.add_header (#134355) 2025-06-10 12:35:37 +02:00
Jiucheng(Oliver)
bcb6b45cb8
gh-134151 Fix TypeError in email.utils.decode_params when sorting RFC 2231 continuations (#134687)
- Fix sorting logic in `email.utils.decode_params` to handle None values.
- Update tests for RFC 2231 continuation sorting.
2025-06-08 09:13:21 +02:00
Sergey Miryanov
d9cad074d5
gh-134155: fix AttributeError in email._header_value_parser.get_address (#134194)
Append the defect to defects instead of to the parse tree.

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
2025-06-05 13:28:11 -04:00
R. David Murray
a32ea45699
gh-134152: Fix UnboundLocalError in email._header_value_parser _get_ptext_to_endchars (#134233)
Fix an UnboundLocalError that can occur when parsing certain delimited constructs in headers (domain literals, quoted strings, comments). After the fix the _get_ptext_to_endchars returns an empty string if there is no content after the opening delimiter. The calling code is responsible for handling the lack of the trailing delimiter, which it already does; this edge case was the header ending immediately after the opening delimiter.
2025-05-25 18:09:32 -04:00
Serhiy Storchaka
84a08f8629
gh-133306: Use \z instead of \Z in regular expressions in the stdlib (GH-133337) 2025-05-03 17:58:49 +03:00
Srinivas Reddy Thatiparthy (తాటిపర్తి శ్రీనివాస్ రెడ్డి)
c432d0147b
gh-127794: Validate email header names according to RFC 5322 (#127820)
`email.message.Message` objects now validate header names specified via `__setitem__`
or `add_header` according to RFC 5322, §2.2 [1].

In particular, callers should expect a ValueError to be raised for invalid header names.

[1]: https://datatracker.ietf.org/doc/html/rfc5322#section-2.2

---------

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: R. David Murray <rdmurray@bitdance.com>
2025-03-30 12:29:29 +00:00
Mike Edmunds
295b53df2a
gh-121284: Fix email address header folding with parsed encoded-word (GH-122754)
Email generators using email.policy.default may convert an RFC 2047
encoded-word to unencoded form during header refolding. In a structured
header, this could allow 'specials' chars outside a quoted-string,
leading to invalid address headers and enabling spoofing. This change
ensures a parsed encoded-word that contains specials is kept as an
encoded-word while the header is refolded.

[Better fix from @bitdancer.]

---------

Co-authored-by: R David Murray <rdmurray@bitdance.com>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
2025-03-18 12:07:17 +01:00
Mike Edmunds
5aaf416858
gh-80222: Fix email address header folding with long quoted-string (#122753)
Email generators using email.policy.default could incorrectly omit the
quote ('"') characters from a quoted-string during header refolding,
leading to invalid address headers and enabling header spoofing. This
change restores the quote characters on a bare-quoted-string as the
header is refolded, and escapes backslash and quote chars in the string.
2025-01-18 19:50:52 -05:00
RanKKI
a62ba52f14
gh-98188: Fix EmailMessage.get_payload to decode data when CTE value has extra text (#127547)
Up to this point message handling has been very strict with regards to content encoding values: mixed case was accepted, but trailing blanks or other text would cause decoding failure, even if the first token was a valid encoding.  By Postel's Rule we should go ahead and decode as long as we can recognize that first token.  We have not thought of any security or backward compatibility concerns with this fix.

This fix does introduce a new technique/pattern to the Message code: we look to see if the header has a 'cte' attribute, and if so we use that.  This effectively promotes the header API exposed by HeaderRegistry to an API that any header parser "should" support.  This seems like a reasonable thing to do.  It is not, however, a requirement, as the string value of the header is still used if there is no cte attribute.

The full fix (ignore any trailing blanks or blank-separated trailing text) applies only to the non-compat32 API.  compat32 is only fixed to the extent that it now ignores trailing spaces.  Note that the HeaderRegistry parsing still records a HeaderDefect if there is extra text.

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
2025-01-05 20:32:16 -05:00
RanKKI
ed81971e6b
gh-124452: Fix header mismatches when folding/unfolding with email message (#125919)
The header-folder of the new email API has a long standing known buglet where
if the first token is longer than max_line_length, it puts that token on the next
line.  It turns out there is also a *parsing* bug when parsing such a header:
the space prefixing that first, non-empty line gets preserved and tacked on to
the start of the header value, which is not the expected behavior per the RFCs.
The bug arises from the fact that the parser assumed that there would be at
least one token on the line with the header, which is going to be true for
probably every email producer other than the python email library with its
folding buglet.  Clearly, though, this is a case that needs to be handled
correctly.  The fix is simple: strip the blanks off the start of the whole
value, not just the first physical line of the value.

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
2024-11-16 18:01:52 -05:00
Hugo van Kemenade
91f4908798
gh-126133: Only use start year in PSF copyright, remove end years (#126236) 2024-11-12 15:59:19 +02:00
Damien
91ff700de2
gh-122989: Replace duplicate “self.policy.linesep” with “linesep” (#123002)
`linesep` is already defined as `self.policy.linesep`.  It appears that previous refactor was not completed.
2024-09-04 02:30:25 -04:00
Petr Viktorin
0976339818
gh-121650: Encode newlines in headers, and verify headers are sound (GH-122233)
## Encode header parts that contain newlines

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.


## Verify that email headers are well-formed

This should fail for custom fold() implementations that aren't careful
about newlines.


Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2024-07-31 00:19:48 +02:00
Serhiy Storchaka
1a0c7b9ba4
gh-121905: Consistently use "floating-point" instead of "floating point" (GH-121907) 2024-07-19 08:06:02 +00:00
Matthieu Caneill
cecaceea31
gh-120930: Remove extra blank occuring in wrapped encoded words in email headers (GH-121747) 2024-07-18 14:48:05 +02:00
Geoffrey Thomas
ef172521a9
Remove almost all unpaired backticks in docstrings (#119231)
As reported in #117847 and #115366, an unpaired backtick in a docstring
tends to confuse e.g. Sphinx running on subclasses of standard library
objects, and the typographic style of using a backtick as an opening
quote is no longer in favor. Convert almost all uses of the form

    The variable `foo' should do xyz

to

    The variable 'foo' should do xyz

and also fix up miscellaneous other unpaired backticks (extraneous /
missing characters).

No functional change is intended here other than in human-readable
docstrings.
2024-05-22 12:35:18 -04:00
Serhiy Storchaka
858b9e85fc
gh-118643: Fix AttributeError in the email module (GH-119099)
Fix regression introduced in gh-100884: AttributeError when re-fold a long
address list.

Also fix more cases of incorrect encoding of the address separator in the
address list missed in gh-100884.
2024-05-22 10:17:46 +00:00
Toshio Kuratomi
a6fdb31b67
gh-92081: Fix for email.generator.Generator with whitespace between encoded words. (#92281)
* Fix for email.generator.Generator with whitespace between encoded words.

email.generator.Generator currently does not handle whitespace between
encoded words correctly when the encoded words span multiple lines.  The
current generator will create an encoded word for each line.  If the end
of the line happens to correspond with the end real word in the
plaintext, the generator will place an unencoded space at the start of
the subsequent lines to represent the whitespace between the plaintext
words.

A compliant decoder will strip all the whitespace from between two
encoded words which leads to missing spaces in the round-tripped
output.

The fix for this is to make sure that whitespace between two encoded
words ends up inside of one or the other of the encoded words.  This
fix places the space inside of the second encoded word.

A second problem happens with continuation lines.  A continuation line that
starts with whitespace and is followed by a non-encoded word is fine because
the newline between such continuation lines is defined as condensing to
a single space character.  When the continuation line starts with whitespace
followed by an encoded word, however, the RFCs specify that the word is run
together with the encoded word on the previous line.  This is because normal
words are filded on syntactic breaks by encoded words are not.

The solution to this is to add the whitespace to the start of the encoded word
on the continuation line.

Test cases are from #92081

* Rename a variable so it's not confused with the final variable.
2024-05-20 19:10:47 +00:00
Hugo van Kemenade
c68acb1384
gh-118798: Remove deprecated isdst parameter from email.utils.localtime (#118799) 2024-05-09 03:17:02 -06:00
wim glenn
fed8d73fde
gh-118455: Fix mangle_from_ default value in email.policy.Policy.__doc__ (#118456)
* Fix mangle_from_ default value in email.policy.Policy.__doc__

The docstring says it defaults to True, but it actually defaults
to False. Only the Compat32 subclass overrides that.

---------

Co-authored-by: Nikita Sobolev <mail@sobolevn.me>
2024-05-05 09:18:04 +03:00
Serhiy Storchaka
deaecb88fa
gh-80361: Fix TypeError in email.Message.get_payload() (GH-117994)
It was raised when the charset is rfc2231 encoded, e.g.:

   Content-Type: text/plain; charset*=ansi-x3.4-1968''utf-8
2024-04-17 19:31:26 +03:00
Ivan Savin
1aa8bbe62f
bpo-40944: Fix IndexError when parse emails with truncated Message-ID, address, routes, etc (GH-20790)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2024-04-17 10:14:22 +00:00
Serhiy Storchaka
aec1dac4ef
gh-117313: Fix re-folding email messages containing non-standard line separators (GH-117369)
Only treat '\n', '\r' and '\r\n' as line separators in re-folding the email
messages.  Preserve control characters '\v', '\f', '\x1c', '\x1d' and '\x1e'
and Unicode line separators '\x85', '\u2028' and '\u2029' as is.
2024-04-17 13:00:25 +03:00
Serhiy Storchaka
f74e51229c
gh-86650: Fix IndexError when parse emails with invalid Message-ID (GH-117934)
In particularly, one-off addresses generated by Microsoft Outlook:
https://learn.microsoft.com/en-us/office/client-developer/outlook/mapi/one-off-addresses

Co-authored-by: fsc-eriker <72394365+fsc-eriker@users.noreply.github.com>
2024-04-17 10:44:41 +03:00
tsufeki
8cc9adbfdd
gh-75171: Fix parsing invalid email address headers starting or ending with a dot (GH-15600)
Co-authored-by: Tim Bell <timothybell@gmail.com>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2024-04-17 10:39:15 +03:00
Serhiy Storchaka
f97f25ef5d
gh-76511: Fix email.Message.as_string() for non-ASCII message with ASCII charset (GH-116125) 2024-03-05 17:49:01 +02:00
Thomas Weißschuh
09fab93c3d
gh-100884: email/_header_value_parser: don't encode list separators (GH-100885)
ListSeparator should not be encoded. This could happen when a long line
pushes its separator to the next line, which would have been encoded.
2024-02-17 10:13:46 +00:00
Shantanu
2124a3ddcc
gh-109653: Improve import time of importlib.metadata / email.utils (#114664)
My criterion for delayed imports is that they're only worth it if the
majority of users of the module would benefit from it, otherwise you're
just moving latency around unpredictably.

mktime_tz is not used anywhere in the standard library and grep.app
indicates it's not got much use in the ecosystem either.

Distribution.files is not nearly as widely used as other
importlib.metadata APIs, so we defer the csv import.

Before:
```
λ hyperfine -w 8 './python -c "import importlib.metadata"'
Benchmark 1: ./python -c "import importlib.metadata"
  Time (mean ± σ):      65.1 ms ±   0.5 ms    [User: 55.3 ms, System: 9.8 ms]
  Range (min … max):    64.4 ms …  66.4 ms    44 runs
```

After:
```
λ hyperfine -w 8 './python -c "import importlib.metadata"'
Benchmark 1: ./python -c "import importlib.metadata"
  Time (mean ± σ):      62.0 ms ±   0.3 ms    [User: 52.5 ms, System: 9.6 ms]
  Range (min … max):    61.3 ms …  62.8 ms    46 runs
```

for about a 3ms saving with warm disk cache, maybe 7-11ms with cold disk
cache.
2024-01-29 01:30:22 -08:00
Rito Takeuchi
504334c7be
gh-77749: Fix inconsistent behavior of non-ASCII handling in EmailPolicy.fold() (GH-6986)
It now always encodes non-ASCII characters in headers if utf8 is false.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2024-01-26 15:19:41 +00:00
Serhiy Storchaka
e9d5b6ea2d
gh-113594: Fix UnicodeEncodeError in TokenList.fold() (GH-113730)
It occurred when try to re-encode an unknown-8bit part combined with non-unknown-8bit part.
2024-01-10 14:54:36 +02:00
Victor Stinner
4a153a1d3b
[CVE-2023-27043] gh-102988: Reject malformed addresses in email.parseaddr() (#111116)
Detect email address parsing errors and return empty tuple to
indicate the parsing error (old API). Add an optional 'strict'
parameter to getaddresses() and parseaddr() functions. Patch by
Thomas Dwyer.

Co-Authored-By: Thomas Dwyer <github@tomd.tel>
2023-12-15 16:10:40 +01:00
Sidney Markowitz
27a5fd8cb8
gh-94606: Fix error when message with Unicode surrogate not surrogateescaped string (GH-94641)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2023-12-11 18:21:18 +02:00
Alex Waygood
aa3f419acb
gh-109653: Improve the import time of email.utils (#109824) 2023-10-12 15:03:20 -07:00
htsedebenham
c65592c4d6
gh-106186: Don't report MultipartInvariantViolationDefect for valid multipart emails when parsing header only (#107016) 2023-07-23 12:25:18 +02:00
Gregory P. Smith
a31dea1feb
gh-106669: Revert "gh-102988: Detect email address parsing errors ... (#105127)" (#106733)
This reverts commit 18dfbd0357.
Adds a regression test from the issue.

See https://github.com/python/cpython/issues/106669.
2023-07-20 20:30:52 -07:00
CF Bolz-Tereick
7e6ce48872
gh-106628: email parsing speedup (gh-106629) 2023-07-13 15:12:56 +09:00
Thomas Dwyer
18dfbd0357
gh-102988: Detect email address parsing errors and return empty tuple to indicate the parsing error (old API) (#105127)
Detect email address parsing errors and return empty tuple to indicate the parsing error (old API). This fixes or at least ameliorates CVE-2023-27043.

---------

Co-authored-by: Gregory P. Smith <greg@krypto.org>
2023-07-10 23:00:55 +00:00
JosephSBoyle
70e2a42647
gh-102542 Remove unused bytes object and bytes slicing (#106433)
Remove unused bytes object and bytes slicing

Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
2023-07-05 09:17:37 -07:00
Paul Ganssle
0b7fd8ffc5
GH-103857: Deprecate utcnow and utcfromtimestamp (#103858)
Using `datetime.datetime.utcnow()` and `datetime.datetime.utcfromtimestamp()` will now raise a `DeprecationWarning`.

We also have removed our internal uses of these functions and documented the change.
2023-04-27 11:32:30 -06:00
JosephSBoyle
04ea04807d
gh-102498 Clean up unused variables and imports in the email module (#102482)
* Clean up unused variables and imports in the email module

* Remove extra newline char

* Remove superflous dict+unpacking syntax

* Remove unused 'msg' var

* Clean up unused variables and imports in the email module

* Remove extra newline char

* Remove superflous dict+unpacking syntax

* Remove unused 'msg' var

---------

Co-authored-by: Barry Warsaw <barry@python.org>
2023-04-24 19:19:28 +00:00
Alan Williams
5e6661bce9
gh-72346: Added isdst deprecation warning to email.utils.localtime (GH-91450) 2023-03-19 19:20:20 -05:00
JosephSBoyle
b097925858
gh-102507 Remove invisible pagebreak characters (#102531)
Co-authored-by: AlexWaygood <alex.waygood@gmail.com>
2023-03-08 13:58:14 +00:00
Bob Kline
49cae39ef0
gh-101021: Document binary parameters as bytes (#101024) 2023-01-14 11:01:27 -08:00
Nikita Sobolev
6746135b07
gh-100792: Make email.message.Message.__contains__ twice as fast (#100793) 2023-01-07 13:26:05 -08:00
Nick Drozd
024ac542d7
bpo-45975: Simplify some while-loops with walrus operator (GH-29347) 2022-11-26 14:33:25 -08:00
Gary Donovan
5d4d83130c
Fix typo on inline comment for email.generator (GH-98210)
Trivial change to comment - no issue or new entry necessary
2022-11-25 10:03:20 -08:00
Serhiy Storchaka
ea5ed0ba51
gh-95087: Fix IndexError in parsing invalid date in the email module (GH-95201)
Co-authored-by: wouter bolsterlee <wouter@bolsterl.ee>
2022-07-25 09:17:25 +03:00
oda-gitso
71abeb0895
gh-93010: InvalidHeaderError used but nonexistent (#93015)
* fix issue 93010

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
2022-05-23 09:10:18 -07:00
slateny
8f29318079
gh-77630: Change Charset to charset (GH-92439) 2022-05-08 17:35:32 +03:00