Commit graph

158 commits

Author SHA1 Message Date
Barry Warsaw
4111804548 test_body_encoding(): a new test for Charset.body_encode(), especially
one that tests the obscure bug reported in SF # 625509.
2002-10-21 05:43:58 +00:00
Barry Warsaw
34aa44538d test_body_encoding(): a new test 2002-10-21 05:31:08 +00:00
Barry Warsaw
3d57589f0f body_encode(): Fixed typo reported by Chris Lawrence, closing SF bug
#625509.  This isn't a huge problem because at the moment there are no
built-in charsets for which header_encoding is QP but body_encoding is
not.
2002-10-21 05:29:53 +00:00
Barry Warsaw
67f8f2fe2a append(): Fixing the test for convertability after consultation with
Ben.  If s is a byte string, make sure it can be converted to unicode
with the input codec, and from unicode with the output codec, or raise
a UnicodeError exception early.  Skip this test (and the unicode->byte
string conversion) when the charset is our faux 8bit raw charset.
2002-10-14 16:52:41 +00:00
Barry Warsaw
a74771c0b9 Two new tests for splitting (or not splitting) 8-bit header data. 2002-10-14 15:26:17 +00:00
Barry Warsaw
1a6ea3398e Bump the __version__ 2002-10-14 15:24:18 +00:00
Barry Warsaw
5e3bcff651 __init__(): Fix an invariant, that the charset item in a chunk tuple
must be a Charset instance, not a string.  The bug here was that
self._charset wasn't being converted to a Charset instance so later
.append() calls which used the default charset would break.

_split(): If the charset of the chunk is '8bit', return the chunk
unchanged.  We can't safely split it, so this is the avenue of least
harm.
2002-10-14 15:13:17 +00:00
Barry Warsaw
6c2bc46355 _split_header(): If we have a header which is a byte string containing
8-bit data, we cannot split it safely, so return the original string
unchanged.

_is8bitstring(): Helper function which returns True when we have a
byte string that contains non-ascii characters (i.e. mysterious 8-bit
data).
2002-10-14 15:09:30 +00:00
Barry Warsaw
7cd724049f CHARSETS: Add faux '8bit' encoding for representing raw 8-bit data for
which we know nothing else.
2002-10-14 15:06:55 +00:00
Barry Warsaw
0c358258c9 _encode_chunks(), encode(): Don't modify self._chunks. As Ben says:
Also, it fixes a really egregious error in Header.encode() (really
    in Header._encode_chunks()) that could cause a header to grow and
    grow each time encode() was called if output_codec was different
    from input_codec.

Also, fix a typo.
2002-10-13 04:06:28 +00:00
Barry Warsaw
ab9439fdd4 Update the urls and other information about the add-on Japanese,
Korean, and Chinese codecs.
2002-10-13 04:00:45 +00:00
Barry Warsaw
c986e54733 Bump version number to 2.4.2 to pick up the latest minor bug fixes. 2002-10-10 15:19:46 +00:00
Barry Warsaw
dc8087b26e New tests to verify that charsets are case insensitive, and that by
default get_body_encoding() cannot be SHORTEST.
2002-10-10 15:14:22 +00:00
Barry Warsaw
ee07cb1d70 get_content_charset(): RFC 2046 $4.1.2 says charsets are not case
sensitive.  Coerce the argument to lower case.
2002-10-10 15:13:26 +00:00
Barry Warsaw
14fc464ec9 __init__(): RFC 2046 $4.1.2 says charsets are not case sensitive.
Coerce the argument to lower case.  Also, since body encodings can't
be SHORTEST, default the CHARSETS failobj's second item to BASE64.
2002-10-10 15:11:20 +00:00
Barry Warsaw
08c82b8086 openfile(): Go back to opening the files in text mode. This undoes
the change in revision 1.11 (test_email.py) in response to SF bug
#609988.  We now think that was the wrong fix and that WinZip was the
real culprit there.
2002-10-07 17:27:55 +00:00
Barry Warsaw
487fe6ac39 _parsebody(): Use get_content_type() instead of the deprecated
get_type().  Also, one of the regular expressions is constant so might
as well make it a module global.  And, when splitting up digests,
handle lineseps that are longer than 1 character in length
(e.g. \r\n).
2002-10-07 17:27:35 +00:00
Barry Warsaw
1d475d3452 Bump the version to 2.4.1 (not 2.5 as previously mentioned) to sync it
with the standalone mimelib package.
2002-10-07 17:20:25 +00:00
Barry Warsaw
0ac885e821 test__all__(): Fix the import list. 2002-10-01 17:57:06 +00:00
Barry Warsaw
2d7fab1a45 Docstring consistency with the updated .tex files. 2002-10-01 00:52:27 +00:00
Barry Warsaw
1f84ff1d40 _structure(): Swap fp and level arguments. 2002-10-01 00:51:47 +00:00
Barry Warsaw
0ebc5c96c5 Docstring consistency with the updated .tex files. 2002-10-01 00:44:13 +00:00
Barry Warsaw
12272a2f22 Docstring consistency with the updated .tex files. 2002-10-01 00:05:24 +00:00
Barry Warsaw
48330687f3 Docstring consistency with the updated .tex files. 2002-09-30 23:07:35 +00:00
Barry Warsaw
0031982c21 Docstring consistency with the updated .tex files. 2002-09-30 22:15:00 +00:00
Barry Warsaw
03a7559654 Docstring consistency with the updated .tex files. 2002-09-30 21:29:10 +00:00
Barry Warsaw
fd2e8f7ea6 Docstring consistency with the updated .tex files. 2002-09-30 21:24:00 +00:00
Barry Warsaw
419b284b7c __all__: Updated 2002-09-30 20:41:33 +00:00
Barry Warsaw
057b8428d0 Docstring consistency with the updated .tex files. 2002-09-30 20:07:22 +00:00
Barry Warsaw
42d1d3edc0 __contains__(): Change the second argument to `name' for consistency.
I seriously doubt this will break any deployed code.

Docstring consistency with the updated .tex files.
2002-09-30 18:17:35 +00:00
Barry Warsaw
174aa49a88 With help from Martin v. Loewis, clarification is added for the
semantics of header chunks using byte and Unicode strings.
Specifically,

append(): When the given string is a byte string, charset (whether
specified explicitly in the argument list or implicitly via the
constructor default) is the encoding of the byte string, and a
UnicodeError will be raised if the string cannot be decoded with that
charset.  If s is a Unicode string, then charset is a hint specifying
the character set of the characters in the string.  In this case, when
producing an RFC 2822 compliant header using RFC 2047 rules, the
Unicode string will be encoded using the following charsets in order:
us-ascii, the charset hint, utf-8.

__init__(): Use the global USASCII Charset instance when the charset
argument is None.  Also, clarification in the docstring.

Also, use True/False where appropriate.
2002-09-30 15:51:31 +00:00
Barry Warsaw
d20b66537c The ansi_x3.4_1968 encoding is an alias for ascii, but isn't known in
Python 2.1.3.  However it's required by the email tests suite, so poke
it into the encodings aliases if it's missing.  The is apparently the
approved API for doing so.

Now we can remove the hexversion shortcircuits in the test suite.
2002-09-30 15:23:17 +00:00
Barry Warsaw
d63071b05f Make the tests pass under Python 2.1 but only by cheating. Python 2.1
doesn't know about the ansi-x3.4-1968 charset so skip two tests that
rely on that (msg_32.txt and msg_33.txt).
2002-09-28 21:22:52 +00:00
Barry Warsaw
eecdc742f5 Add a test for SHORTEST encoding of utf-8 headers, and also update
some of the test values which change because of this.
2002-09-28 21:04:19 +00:00
Barry Warsaw
c202d93e0e Use True/False everywhere, and other code cleanups. 2002-09-28 21:02:51 +00:00
Barry Warsaw
f776e6922c Code cleanup and add docstrings. 2002-09-28 20:52:26 +00:00
Barry Warsaw
5bdb2bee37 Use True/False everywhere, and other code cleanups. 2002-09-28 20:49:57 +00:00
Barry Warsaw
e03e8f09eb Use True/False everywhere. 2002-09-28 20:44:58 +00:00
Barry Warsaw
4ece778bbc is_multipart(): Use isinstance() instead of type equality. 2002-09-28 20:41:39 +00:00
Barry Warsaw
c494549566 Docstring and code cleanups, e.g. use True/False everywhere. 2002-09-28 20:40:25 +00:00
Barry Warsaw
bba6b0243e __init__(): Minor code cleanup. 2002-09-28 20:27:28 +00:00
Barry Warsaw
5f253279d6 Add a pychecker suppression. 2002-09-28 20:25:15 +00:00
Barry Warsaw
56835dd961 Use True/False everywhere. 2002-09-28 18:04:55 +00:00
Barry Warsaw
5932c9bedd Added a feature suggested by Martin v Loewis, where a new header
encoding flag SHORTEST means to return the shortest encoding between
base64 and qp.  This is used for the header_enc for utf-8.  SHORTEST
isn't legal for body_enc.

Also some code cleanup:

- use True/False everywhere
- use == instead of `is' in a few places
- added _unicode() and make consistent the "is unicode" checks
- update docstrings
2002-09-28 17:47:56 +00:00
Barry Warsaw
09f7424f3a test_unicode_error(): Comment this test out, since we still have
controversy.
2002-09-26 17:21:53 +00:00
Barry Warsaw
9c74569ec9 Fixing some RFC 2231 related issues as reported in the Spambayes
project, and with assistance from Oleg Broytmann.  Specifically,
added some new tests to make sure we handle RFC 2231 encoded
parameters correctly.  Two new data files were added which contain RFC
2231 encoded parameters.
2002-09-26 17:21:02 +00:00
Barry Warsaw
15aefa94d0 Fixing some RFC 2231 related issues as reported in the Spambayes
project, and with assistance from Oleg Broytmann.  Specifically,

get_param(), get_params(): Document that these methods may return
parameter values that are either strings, or 3-tuples in the case of
RFC 2231 encoded parameters.  The application should be prepared to
deal with such return values.

get_boundary(): Be prepared to deal with RFC 2231 encoded boundary
parameters.  It makes little sense to have boundaries that are
anything but ascii, so if we get back a 3-tuple from get_param() we
will decode it into ascii and let any failures percolate up.

get_content_charset(): New method which treats the charset parameter
just like the boundary parameter in get_boundary().  Note that
"get_charset()" was already taken to return the default Charset
object.

get_charsets(): Rewrite to use get_content_charset().
2002-09-26 17:19:34 +00:00
Barry Warsaw
6f30a8ab62 __version__: Bump to 2.4
Move the imports of Parser and Message inside the
message_from_string() and message_from_file() functions.  This way
just "import email" won't suck in most of the submodules of the
package.

Note: this will break code that relied on "import email" giving you a
bunch of the submodules, but that was never documented and should not
have been relied on.
2002-09-25 22:07:50 +00:00
Barry Warsaw
40363b63f0 Open the test files in binary mode so the \r\n files won't cause
failures on Windows.  Closes SF bug # 609988.
2002-09-18 22:17:57 +00:00
Barry Warsaw
78170048f9 Bump to 2.3.1 to pick up the missing file. 2002-09-12 03:44:50 +00:00