Detect email address parsing errors and return empty tuple to
indicate the parsing error (old API). Add an optional 'strict'
parameter to getaddresses() and parseaddr() functions. Patch by
Thomas Dwyer.
Co-Authored-By: Thomas Dwyer <github@tomd.tel>
Detect email address parsing errors and return empty tuple to indicate the parsing error (old API). This fixes or at least ameliorates CVE-2023-27043.
---------
Co-authored-by: Gregory P. Smith <greg@krypto.org>
Using `datetime.datetime.utcnow()` and `datetime.datetime.utcfromtimestamp()` will now raise a `DeprecationWarning`.
We also have removed our internal uses of these functions and documented the change.
I am re-submitting an older PR which was abandoned but is still relevant, #10783 by @timb07.
The issue being solved () is still relevant. The original PR #10783 was closed as
the final request changes were not applied and since abandoned.
In this new PR I have re-used the original patch plus applied both comments from the review, by @maxking and @pganssle.
For reference, here is the original PR description:
In email.utils.parsedate_to_datetime(), a failure to parse the date, or invalid date components (such as hour outside 0..23) raises an exception. Document this behaviour, and add tests to test_email/test_utils.py to confirm this behaviour.
In email.headerregistry.DateHeader.parse(), check when parsedate_to_datetime() raises an exception and add a new defect InvalidDateDefect; preserve the invalid value as the string value of the header, but set the datetime attribute to None.
Add tests to test_email/test_headerregistry.py to confirm this behaviour; also added test to test_email/test_inversion.py to confirm emails with such defective date headers round trip successfully.
This pull request incorporates feedback gratefully received from @bitdancer, @brettcannon, @Mariatta and @warsaw, and replaces the earlier PR #2254.
Automerge-Triggered-By: GH:warsaw
For me as a non native English speaker, the sentence with its embedded clause was very hard to understand.
modified: Lib/email/utils.py
Automerge-Triggered-By: @csabella
* Check intersection of two sets explicitly
Comparing ``len(a) > ``len(a - b)`` is essentially looking for an
intersection between the two sets. If set ``b`` does not intersect ``a``
then ``len(a - b)`` will be equal to ``len(a)``. This logic is more
clearly expressed as ``a & b``.
* Change while/pop to a for-loop
Copying the list, then repeatedly popping the first element was
unnecessarily slow. I also cleaned up a couple other inefficiencies.
There's no need to unpack a tuple, then re-pack and append it. The list
can be created with the first element instead of empty. Secondly, the
``endswith`` method returns a bool, so there's no need for an if-
statement to set ``encoding`` to True or False.
* Use set.intersection to check for intersections
``a.intersection(b)`` method is more clear of purpose than ``not
a.isdisjoint(b)`` and avoids an unnecessary set construction that ``a &
set(b)`` performs.
* Use not isdisjoint instead of intersection
While it reads slightly worse, the isdisjoint method will stop when it
finds a counterexample and returns a bool, rather than looping over the
entire iterable and constructing a new set.
While there is not real bug in this case, using re.IGNORECASE without re.ASCII
leads unexpected behavior.
Instead of adding re.ASCII, this commit removes re.IGNORECASE flag because
it's easier and simpler.
This commit removes dead copy of the pattern in email.util module too.
While the pattern is same, it is compiled separately because it had different flags.
This adds EmailMessage and, MIMEPart subclasses of Message
with new API methods, and a ContentManager class used by
the new methods. Also a new policy setting, content_manager.
Patch was reviewed by Stephen J. Turnbull and Serhiy Storchaka,
and reflects their feedback.
I will ideally add some examples of using the new API to the
documentation before the final release.
The new _has_surrogates code was suggested by Serhiy Storchaka. See
the issue for timings, but it is far faster than any other alternative,
and also removes the load time that we previously incurred from compiling
the complex regex this replaces.
Without this function people would be tempted to use the other date functions
in email.utils to compute an aware localtime, and those functions are not as
good for that purpose as this code. The code is Alexander Belopolsy's from
his proposed patch for issue 9527, with a fix (and additional tests) by Brian
K. Jones.
When the new policies are used (and only when the new policies are explicitly
used) headers turn into objects that have attributes based on their parsed
values, and can be set using objects that encapsulate the values, as well as
set directly from unicode strings. The folding algorithm then takes care of
encoding unicode where needed, and folding according to the highest level
syntactic objects.
With this patch only date and time headers are parsed as anything other than
unstructured, but that is all the helper methods in the existing API handle.
I do plan to add more parsers, and complete the set specified in the RFC
before the package becomes stable.
This patch primarily does two things: (1) it adds some internal-interface
methods to Policy that allow for Policy to control the parsing and folding of
headers in such a way that we can construct a backward compatibility policy
that is 100% compatible with the 3.2 API, while allowing a new policy to
implement the email6 API. (2) it adds that backward compatibility policy and
refactors the test suite so that the only differences between the 3.2
test_email.py file and the 3.3 test_email.py file is some small changes in
test framework and the addition of tests for bugs fixed that apply to the 3.2
API.
There are some additional teaks, such as moving just the code needed for the
compatibility policy into _policybase, so that the library code can import
only _policybase. That way the new code that will be added for email6
will only get imported when a non-compatibility policy is imported.
Code contributed by Matt Giuca. quote() now encodes the input
before quoting, unquote() decodes after unquoting. There are
new arguments to change the encoding and errors settings.
There are also new APIs to skip the encode/decode steps.
[un]quote_plus() are also affected.
It consists of code from urllib, urllib2, urlparse, and robotparser.
The old modules have all been removed. The new package has five
submodules: urllib.parse, urllib.request, urllib.response,
urllib.error, and urllib.robotparser. The urllib.request.urlopen()
function uses the url opener from urllib2.
Note that the unittests have not been renamed for the
beta, but they will be renamed in the future.
Joint work with Senthil Kumaran.
MIMEApplication() requires a bytes object for its _data, so fix the tests.
We no longer need utils._identity() or utils._bdecode(). The former isn't
used anywhere AFAICT (where's "make test's" lint? <wink>) and the latter is a
kludge that is eliminated by base64.b64encode().
Current status: 5F/5E
This should restore the email package in the py3k branch to exactly what's in
the sandbox.
This wipes out 1-2 fixes made post-copy, which I'll re-apply shortly.
Completely get rid of StringIO.py and cStringIO.c.
I had to fix a few tests and modules beyond what Christian did, and
invent a few conventions. E.g. in elementtree, I chose to
write/return Unicode strings whe no encoding is given, but bytes when
an explicit encoding is given. Also mimetools was made to always
assume binary files.