Proofread and spell checked, all except the Examples section (which

I'll do next).
2025-08-04 00:48:58 +00:00 · 2002-10-01 04:33:16 +00:00 · 2002-10-01 04:33:16 +00:00 · 5db478fa29
commit 5db478fa29
parent cc3a6df506
9 changed files with 350 additions and 357 deletions
--- a/Doc/lib/emailheaders.tex
+++ b/Doc/lib/emailheaders.tex
@ -3,7 +3,7 @@

 \rfc{2822} is the base standard that describes the format of email
 messages.  It derives from the older \rfc{822} standard which came
-into widespread at a time when most email was composed of \ASCII{}
+into widespread use at a time when most email was composed of \ASCII{}
 characters only.  \rfc{2822} is a specification written assuming email
 contains only 7-bit \ASCII{} characters.

@ -19,10 +19,9 @@ The \module{email} package supports these standards in its

 If you want to include non-\ASCII{} characters in your email headers,
 say in the \mailheader{Subject} or \mailheader{To} fields, you should
-use the \class{Header} class (in module \module{email.Header} and
-assign the field in the \class{Message} object to an instance of
-\class{Header} instead of using a string for the header value.  For
-example:
+use the \class{Header} class and assign the field in the
+\class{Message} object to an instance of \class{Header} instead of
+using a string for the header value.  For example:

 \begin{verbatim}
 >>> from email.Message import Message
@ -50,7 +49,8 @@ Here is the \class{Header} class description:

 \begin{classdesc}{Header}{\optional{s\optional{, charset\optional{,
    maxlinelen\optional{, header_name\optional{, continuation_ws}}}}}}
-Create a MIME-compliant header that can contain many character sets.
+Create a MIME-compliant header that can contain strings in different
+character sets.

 Optional \var{s} is the initial header value.  If \code{None} (the
 default), the initial header value is not set.  You can later append
@ -74,7 +74,7 @@ e.g. \mailheader{Subject}) pass in the name of the field in
 default value for \var{header_name} is \code{None}, meaning it is not
 taken into account for the first line of a long, split header.

-Optional \var{continuation_ws} must be RFC 2822 compliant folding
+Optional \var{continuation_ws} must be \rfc{2822}-compliant folding
 whitespace, and is usually either a space or a hard tab character.
 This character will be prepended to continuation lines.
 \end{classdesc}
@ -89,7 +89,7 @@ will be converted to a \class{Charset} instance.  A value of
 constructor is used.

 \var{s} may be a byte string or a Unicode string.  If it is a byte
-string (i.e. \code{isinstance(s, StringType)} is true), then
+string (i.e. \code{isinstance(s, str)} is true), then
 \var{charset} is the encoding of that byte string, and a
 \exception{UnicodeError} will be raised if the string cannot be
 decoded with that character set.
@ -113,7 +113,7 @@ standard operators and built-in functions.

 \begin{methoddesc}[Header]{__str__}{}
 A synonym for \method{Header.encode()}.  Useful for
-\code{str(aHeader)} calls.
+\code{str(aHeader)}.
 \end{methoddesc}

 \begin{methoddesc}[Header]{__unicode__}{}
@ -165,245 +165,3 @@ This function takes one of those sequence of pairs and returns a
 \var{header_name}, and \var{continuation_ws} are as in the
 \class{Header} constructor.
 \end{funcdesc}
-
-\declaremodule{standard}{email.Charset}
-\modulesynopsis{Character Sets}
-
-This module provides a class \class{Charset} for representing
-character sets and character set conversions in email messages, as
-well as a character set registry and several convenience methods for
-manipulating this registry.  Instances of \class{Charset} are used in
-several other modules within the \module{email} package.
-
-\versionadded{2.2.2}
-
-\begin{classdesc}{Charset}{\optional{input_charset}}
-Map character sets to their email properties.
-
-This class provides information about the requirements imposed on
-email for a specific character set.  It also provides convenience
-routines for converting between character sets, given the availability
-of the applicable codecs.  Given a character set, it will do its best
-to provide information on how to use that character set in an email
-message in an RFC-compliant way.
-
-Certain character sets must be encoded with quoted-printable or base64
-when used in email headers or bodies.  Certain character sets must be
-converted outright, and are not allowed in email.
-
-Optional \var{input_charset} is as described below.  After being alias
-normalized it is also used as a lookup into the registry of character
-sets to find out the header encoding, body encoding, and output
-conversion codec to be used for the character set.  For example, if
-\var{input_charset} is \code{iso-8859-1}, then headers and bodies will
-be encoded using quoted-printable and no output conversion codec is
-necessary.  If \var{input_charset} is \code{euc-jp}, then headers will
-be encoded with base64, bodies will not be encoded, but output text
-will be converted from the \code{euc-jp} character set to the
-\code{iso-2022-jp} character set.
-\end{classdesc}
-
-\class{Charset} instances have the following data attributes:
-
-\begin{datadesc}{input_charset}
-The initial character set specified.  Common aliases are converted to
-their \emph{official} email names (e.g. \code{latin_1} is converted to
-\code{iso-8859-1}).  Defaults to 7-bit \code{us-ascii}.
-\end{datadesc}
-
-\begin{datadesc}{header_encoding}
-If the character set must be encoded before it can be used in an
-email header, this attribute will be set to \code{Charset.QP} (for
-quoted-printable), \code{Charset.BASE64} (for base64 encoding), or
-\code{Charset.SHORTEST} for the shortest of QP or BASE64 encoding.
-Otherwise, it will be \code{None}.
-\end{datadesc}
-
-\begin{datadesc}{body_encoding}
-Same as \var{header_encoding}, but describes the encoding for the
-mail message's body, which indeed may be different than the header
-encoding.  \code{Charset.SHORTEST} is not allowed for
-\var{body_encoding}.
-\end{datadesc}
-
-\begin{datadesc}{output_charset}
-Some character sets must be converted before the can be used in
-email headers or bodies.  If the \var{input_charset} is one of
-them, this attribute will contain the name of the character set
-output will be converted to.  Otherwise, it will be \code{None}.
-\end{datadesc}
-
-\begin{datadesc}{input_codec}
-The name of the Python codec used to convert the \var{input_charset} to
-Unicode.  If no conversion codec is necessary, this attribute will be
-\code{None}.
-\end{datadesc}
-
-\begin{datadesc}{output_codec}
-The name of the Python codec used to convert Unicode to the
-\var{output_charset}.  If no conversion codec is necessary, this
-attribute will have the same value as the \var{input_codec}.
-\end{datadesc}
-
-\class{Charset} instances also have the following methods:
-
-\begin{methoddesc}[Charset]{get_body_encoding}{}
-Return the content transfer encoding used for body encoding.
-
-This is either the string \samp{quoted-printable} or \samp{base64}
-depending on the encoding used, or it is a function, in which case you
-should call the function with a single argument, the Message object
-being encoded.  The function should then set the
-\mailheader{Content-Transfer-Encoding} header itself to whatever is
-appropriate.
-
-Returns the string \samp{quoted-printable} if
-\var{body_encoding} is \code{QP}, returns the string
-\samp{base64} if \var{body_encoding} is \code{BASE64}, and returns the
-string \samp{7bit} otherwise.
-\end{methoddesc}
-
-\begin{methoddesc}{convert}{s}
-Convert the string \var{s} from the \var{input_codec} to the
-\var{output_codec}.
-\end{methoddesc}
-
-\begin{methoddesc}{to_splittable}{s}
-Convert a possibly multibyte string to a safely splittable format.
-\var{s} is the string to split.
-
-Uses the \var{input_codec} to try and convert the string to Unicode,
-so it can be safely split on character boundaries (even for multibyte
-characters).
-
-Returns the string as-is if it isn't known how to convert \var{s} to
-Unicode with the \var{input_charset}.
-
-Characters that could not be converted to Unicode will be replaced
-with the Unicode replacement character \character{U+FFFD}.
-\end{methoddesc}
-
-\begin{methoddesc}{from_splittable}{ustr\optional{, to_output}}
-Convert a splittable string back into an encoded string.  \var{ustr}
-is a Unicode string to ``unsplit''.
-
-This method uses the proper codec to try and convert the string from
-Unicode back into an encoded format.  Return the string as-is if it is
-not Unicode, or if it could not be converted from Unicode.
-
-Characters that could not be converted from Unicode will be replaced
-with an appropriate character (usually \character{?}).
-
-If \var{to_output} is \code{True} (the default), uses
-\var{output_codec} to convert to an 
-encoded format.  If \var{to_output} is \code{False}, it uses
-\var{input_codec}.
-\end{methoddesc}
-
-\begin{methoddesc}{get_output_charset}{}
-Return the output character set.
-
-This is the \var{output_charset} attribute if that is not \code{None},
-otherwise it is \var{input_charset}.
-\end{methoddesc}
-
-\begin{methoddesc}{encoded_header_len}{}
-Return the length of the encoded header string, properly calculating
-for quoted-printable or base64 encoding.
-\end{methoddesc}
-
-\begin{methoddesc}{header_encode}{s\optional{, convert}}
-Header-encode the string \var{s}.
-
-If \var{convert} is \code{True}, the string will be converted from the
-input charset to the output charset automatically.  This is not useful
-for multibyte character sets, which have line length issues (multibyte
-characters must be split on a character, not a byte boundary); use the
-higher-level \class{Header} class to deal with these issues (see
-\refmodule{email.Header}).  \var{convert} defaults to \code{False}.
-
-The type of encoding (base64 or quoted-printable) will be based on
-the \var{header_encoding} attribute.
-\end{methoddesc}
-
-\begin{methoddesc}{body_encode}{s\optional{, convert}}
-Body-encode the string \var{s}.
-
-If \var{convert} is \code{True} (the default), the string will be
-converted from the input charset to output charset automatically.
-Unlike \method{header_encode()}, there are no issues with byte
-boundaries and multibyte charsets in email bodies, so this is usually
-pretty safe.
-
-The type of encoding (base64 or quoted-printable) will be based on
-the \var{body_encoding} attribute.
-\end{methoddesc}
-
-The \class{Charset} class also provides a number of methods to support
-standard operations and built-in functions.
-
-\begin{methoddesc}[Charset]{__str__}{}
-Returns \var{input_charset} as a string coerced to lower case.
-\end{methoddesc}
-
-\begin{methoddesc}[Charset]{__eq__}{other}
-This method allows you to compare two \class{Charset} instances for equality.
-\end{methoddesc}
-
-\begin{methoddesc}[Header]{__ne__}{other}
-This method allows you to compare two \class{Charset} instances for inequality.
-\end{methoddesc}
-
-The \module{email.Charset} module also provides the following
-functions for adding new entries to the global character set, alias,
-and codec registries:
-
-\begin{funcdesc}{add_charset}{charset\optional{, header_enc\optional{,
-    body_enc\optional{, output_charset}}}}
-Add character properties to the global registry.
-
-\var{charset} is the input character set, and must be the canonical
-name of a character set.
-
-Optional \var{header_enc} and \var{body_enc} is either
-\code{Charset.QP} for quoted-printable, \code{Charset.BASE64} for
-base64 encoding, \code{Charset.SHORTEST} for the shortest of qp or
-base64 encoding, or \code{None} for no encoding.  \code{SHORTEST} is
-only valid for \var{header_enc}.  It describes how message headers and
-message bodies in the input charset are to be encoded.  Default is no
-encoding.
-
-Optional \var{output_charset} is the character set that the output
-should be in.  Conversions will proceed from input charset, to
-Unicode, to the output charset when the method
-\method{Charset.convert()} is called.  The default is to output in the
-same character set as the input.
-
-Both \var{input_charset} and \var{output_charset} must have Unicode
-codec entries in the module's character set-to-codec mapping; use
-\function{add_codec(charset, codecname)} to add codecs the module does
-not know about.  See the \refmodule{codecs} module's documentation for
-more information.
-
-The global character set registry is kept in the module global
-dictionary \code{CHARSETS}.
-\end{funcdesc}
-
-\begin{funcdesc}{add_alias}{alias, canonical}
-Add a character set alias.  \var{alias} is the alias name,
-e.g. \code{latin-1}.  \var{canonical} is the character set's canonical
-name, e.g. \code{iso-8859-1}.
-
-The global charset alias registry is kept in the module global
-dictionary \code{ALIASES}.
-\end{funcdesc}
-
-\begin{funcdesc}{add_codec}{charset, codecname}
-Add a codec that map characters in the given character set to and from
-Unicode.
-
-\var{charset} is the canonical name of a character set.
-\var{codecname} is the name of a Python codec, as appropriate for the
-second argument to the \function{unicode()} built-in, or to the
-\method{encode()} method of a Unicode string.
-\end{funcdesc}