#10686: recode non-ASCII headers to 'unknown-8bit' instead of ?s.

This applies only when generating strings from non-RFC compliant binary
input; it makes the existing recoding behavior more consistent (ie:
now no data is lost when recoding).
This commit is contained in:
R. David Murray 2011-01-07 23:25:30 +00:00
parent 6f0022d84a
commit 9253214fd9
9 changed files with 109 additions and 62 deletions

View file

@ -79,8 +79,8 @@ Here are the public methods of the :class:`Generator` class, imported from the
Messages parsed with a Bytes parser that have a
:mailheader:`Content-Transfer-Encoding` of 8bit will be converted to a
use a 7bit Content-Transfer-Encoding. Any other non-ASCII bytes in the
message structure will be converted to '?' characters.
use a 7bit Content-Transfer-Encoding. Non-ASCII bytes in the headers
will be :rfc:`2047` encoded with a charset of `unknown-8bit`.
.. versionchanged:: 3.2
Added support for re-encoding 8bit message bodies, and the *linesep*

View file

@ -130,8 +130,14 @@ Here is the :class:`Header` class description:
.. method:: __str__()
A helper for :class:`str`'s :func:`encode` method. Returns the header as
a Unicode string.
Returns an approximation of the :class:`Header` as a string, using an
unlimited line length. All pieces are converted to unicode using the
specified encoding and joined together appropriately. Any pieces with a
charset of `unknown-8bit` are decoded as `ASCII` using the `replace`
error handler.
.. versionchanged:: 3.2
Added handling for the `unknown-8bit` charset.
.. method:: __eq__(other)

View file

@ -169,9 +169,10 @@ Here are the methods of the :class:`Message` class:
Note that in all cases, any envelope header present in the message is not
included in the mapping interface.
In a model generated from bytes, any header values that (in contravention
of the RFCs) contain non-ASCII bytes will have those bytes transformed
into '?' characters when the values are retrieved through this interface.
In a model generated from bytes, any header values that (in contravention of
the RFCs) contain non-ASCII bytes will, when retrieved through this
interface, be represented as :class:`~email.header.Header` objects with
a charset of `unknown-8bit`.
.. method:: __len__()