mirror of
				https://github.com/python/cpython.git
				synced 2025-10-25 15:58:57 +00:00 
			
		
		
		
	 49fd7fa443
			
		
	
	
		49fd7fa443
		
	
	
	
	
		
			
			number of tests, all because of the codecs/_multibytecodecs issue described here (it's not a Py3K issue, just something Py3K discovers): http://mail.python.org/pipermail/python-dev/2006-April/064051.html Hye-Shik Chang promised to look for a fix, so no need to fix it here. The tests that are expected to break are: test_codecencodings_cn test_codecencodings_hk test_codecencodings_jp test_codecencodings_kr test_codecencodings_tw test_codecs test_multibytecodec This merge fixes an actual test failure (test_weakref) in this branch, though, so I believe merging is the right thing to do anyway.
		
			
				
	
	
		
			178 lines
		
	
	
	
		
			7.4 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
			
		
		
	
	
			178 lines
		
	
	
	
		
			7.4 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
| \declaremodule{standard}{email.header}
 | |
| \modulesynopsis{Representing non-ASCII headers}
 | |
| 
 | |
| \rfc{2822} is the base standard that describes the format of email
 | |
| messages.  It derives from the older \rfc{822} standard which came
 | |
| into widespread use at a time when most email was composed of \ASCII{}
 | |
| characters only.  \rfc{2822} is a specification written assuming email
 | |
| contains only 7-bit \ASCII{} characters.
 | |
| 
 | |
| Of course, as email has been deployed worldwide, it has become
 | |
| internationalized, such that language specific character sets can now
 | |
| be used in email messages.  The base standard still requires email
 | |
| messages to be transferred using only 7-bit \ASCII{} characters, so a
 | |
| slew of RFCs have been written describing how to encode email
 | |
| containing non-\ASCII{} characters into \rfc{2822}-compliant format.
 | |
| These RFCs include \rfc{2045}, \rfc{2046}, \rfc{2047}, and \rfc{2231}.
 | |
| The \module{email} package supports these standards in its
 | |
| \module{email.header} and \module{email.charset} modules.
 | |
| 
 | |
| If you want to include non-\ASCII{} characters in your email headers,
 | |
| say in the \mailheader{Subject} or \mailheader{To} fields, you should
 | |
| use the \class{Header} class and assign the field in the
 | |
| \class{Message} object to an instance of \class{Header} instead of
 | |
| using a string for the header value.  Import the \class{Header} class from the
 | |
| \module{email.header} module.  For example:
 | |
| 
 | |
| \begin{verbatim}
 | |
| >>> from email.message import Message
 | |
| >>> from email.header import Header
 | |
| >>> msg = Message()
 | |
| >>> h = Header('p\xf6stal', 'iso-8859-1')
 | |
| >>> msg['Subject'] = h
 | |
| >>> print msg.as_string()
 | |
| Subject: =?iso-8859-1?q?p=F6stal?=
 | |
| 
 | |
| 
 | |
| \end{verbatim}
 | |
| 
 | |
| Notice here how we wanted the \mailheader{Subject} field to contain a
 | |
| non-\ASCII{} character?  We did this by creating a \class{Header}
 | |
| instance and passing in the character set that the byte string was
 | |
| encoded in.  When the subsequent \class{Message} instance was
 | |
| flattened, the \mailheader{Subject} field was properly \rfc{2047}
 | |
| encoded.  MIME-aware mail readers would show this header using the
 | |
| embedded ISO-8859-1 character.
 | |
| 
 | |
| \versionadded{2.2.2}
 | |
| 
 | |
| Here is the \class{Header} class description:
 | |
| 
 | |
| \begin{classdesc}{Header}{\optional{s\optional{, charset\optional{,
 | |
|     maxlinelen\optional{, header_name\optional{, continuation_ws\optional{,
 | |
|     errors}}}}}}}
 | |
| Create a MIME-compliant header that can contain strings in different
 | |
| character sets.
 | |
| 
 | |
| Optional \var{s} is the initial header value.  If \code{None} (the
 | |
| default), the initial header value is not set.  You can later append
 | |
| to the header with \method{append()} method calls.  \var{s} may be a
 | |
| byte string or a Unicode string, but see the \method{append()}
 | |
| documentation for semantics.
 | |
| 
 | |
| Optional \var{charset} serves two purposes: it has the same meaning as
 | |
| the \var{charset} argument to the \method{append()} method.  It also
 | |
| sets the default character set for all subsequent \method{append()}
 | |
| calls that omit the \var{charset} argument.  If \var{charset} is not
 | |
| provided in the constructor (the default), the \code{us-ascii}
 | |
| character set is used both as \var{s}'s initial charset and as the
 | |
| default for subsequent \method{append()} calls.
 | |
| 
 | |
| The maximum line length can be specified explicit via
 | |
| \var{maxlinelen}.  For splitting the first line to a shorter value (to
 | |
| account for the field header which isn't included in \var{s},
 | |
| e.g. \mailheader{Subject}) pass in the name of the field in
 | |
| \var{header_name}.  The default \var{maxlinelen} is 76, and the
 | |
| default value for \var{header_name} is \code{None}, meaning it is not
 | |
| taken into account for the first line of a long, split header.
 | |
| 
 | |
| Optional \var{continuation_ws} must be \rfc{2822}-compliant folding
 | |
| whitespace, and is usually either a space or a hard tab character.
 | |
| This character will be prepended to continuation lines.
 | |
| \end{classdesc}
 | |
| 
 | |
| Optional \var{errors} is passed straight through to the
 | |
| \method{append()} method.
 | |
| 
 | |
| \begin{methoddesc}[Header]{append}{s\optional{, charset\optional{, errors}}}
 | |
| Append the string \var{s} to the MIME header.
 | |
| 
 | |
| Optional \var{charset}, if given, should be a \class{Charset} instance
 | |
| (see \refmodule{email.charset}) or the name of a character set, which
 | |
| will be converted to a \class{Charset} instance.  A value of
 | |
| \code{None} (the default) means that the \var{charset} given in the
 | |
| constructor is used.
 | |
| 
 | |
| \var{s} may be a byte string or a Unicode string.  If it is a byte
 | |
| string (i.e. \code{isinstance(s, str)} is true), then
 | |
| \var{charset} is the encoding of that byte string, and a
 | |
| \exception{UnicodeError} will be raised if the string cannot be
 | |
| decoded with that character set.
 | |
| 
 | |
| If \var{s} is a Unicode string, then \var{charset} is a hint
 | |
| specifying the character set of the characters in the string.  In this
 | |
| case, when producing an \rfc{2822}-compliant header using \rfc{2047}
 | |
| rules, the Unicode string will be encoded using the following charsets
 | |
| in order: \code{us-ascii}, the \var{charset} hint, \code{utf-8}.  The
 | |
| first character set to not provoke a \exception{UnicodeError} is used.
 | |
| 
 | |
| Optional \var{errors} is passed through to any \function{unicode()} or
 | |
| \function{ustr.encode()} call, and defaults to ``strict''.
 | |
| \end{methoddesc}
 | |
| 
 | |
| \begin{methoddesc}[Header]{encode}{\optional{splitchars}}
 | |
| Encode a message header into an RFC-compliant format, possibly
 | |
| wrapping long lines and encapsulating non-\ASCII{} parts in base64 or
 | |
| quoted-printable encodings.  Optional \var{splitchars} is a string
 | |
| containing characters to split long ASCII lines on, in rough support
 | |
| of \rfc{2822}'s \emph{highest level syntactic breaks}.  This doesn't
 | |
| affect \rfc{2047} encoded lines.
 | |
| \end{methoddesc}
 | |
| 
 | |
| The \class{Header} class also provides a number of methods to support
 | |
| standard operators and built-in functions.
 | |
| 
 | |
| \begin{methoddesc}[Header]{__str__}{}
 | |
| A synonym for \method{Header.encode()}.  Useful for
 | |
| \code{str(aHeader)}.
 | |
| \end{methoddesc}
 | |
| 
 | |
| \begin{methoddesc}[Header]{__unicode__}{}
 | |
| A helper for the built-in \function{unicode()} function.  Returns the
 | |
| header as a Unicode string.
 | |
| \end{methoddesc}
 | |
| 
 | |
| \begin{methoddesc}[Header]{__eq__}{other}
 | |
| This method allows you to compare two \class{Header} instances for equality.
 | |
| \end{methoddesc}
 | |
| 
 | |
| \begin{methoddesc}[Header]{__ne__}{other}
 | |
| This method allows you to compare two \class{Header} instances for inequality.
 | |
| \end{methoddesc}
 | |
| 
 | |
| The \module{email.header} module also provides the following
 | |
| convenient functions.
 | |
| 
 | |
| \begin{funcdesc}{decode_header}{header}
 | |
| Decode a message header value without converting the character set.
 | |
| The header value is in \var{header}.
 | |
| 
 | |
| This function returns a list of \code{(decoded_string, charset)} pairs
 | |
| containing each of the decoded parts of the header.  \var{charset} is
 | |
| \code{None} for non-encoded parts of the header, otherwise a lower
 | |
| case string containing the name of the character set specified in the
 | |
| encoded string.
 | |
| 
 | |
| Here's an example:
 | |
| 
 | |
| \begin{verbatim}
 | |
| >>> from email.header import decode_header
 | |
| >>> decode_header('=?iso-8859-1?q?p=F6stal?=')
 | |
| [('p\xf6stal', 'iso-8859-1')]
 | |
| \end{verbatim}
 | |
| \end{funcdesc}
 | |
| 
 | |
| \begin{funcdesc}{make_header}{decoded_seq\optional{, maxlinelen\optional{,
 | |
|     header_name\optional{, continuation_ws}}}}
 | |
| Create a \class{Header} instance from a sequence of pairs as returned
 | |
| by \function{decode_header()}.
 | |
| 
 | |
| \function{decode_header()} takes a header value string and returns a
 | |
| sequence of pairs of the format \code{(decoded_string, charset)} where
 | |
| \var{charset} is the name of the character set.
 | |
| 
 | |
| This function takes one of those sequence of pairs and returns a
 | |
| \class{Header} instance.  Optional \var{maxlinelen},
 | |
| \var{header_name}, and \var{continuation_ws} are as in the
 | |
| \class{Header} constructor.
 | |
| \end{funcdesc}
 |