mirror of
				https://github.com/python/cpython.git
				synced 2025-11-03 19:34:08 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			354 lines
		
	
	
	
		
			14 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			354 lines
		
	
	
	
		
			14 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
 | 
						|
:mod:`rfc822` --- Parse RFC 2822 mail headers
 | 
						|
=============================================
 | 
						|
 | 
						|
.. module:: rfc822
 | 
						|
   :synopsis: Parse 2822 style mail messages.
 | 
						|
   :deprecated:
 | 
						|
 | 
						|
 | 
						|
.. deprecated:: 2.3
 | 
						|
   The :mod:`email` package should be used in preference to the :mod:`rfc822`
 | 
						|
   module.  This module is present only to maintain backward compatibility.
 | 
						|
 | 
						|
This module defines a class, :class:`Message`, which represents an "email
 | 
						|
message" as defined by the Internet standard :rfc:`2822`. [#]_  Such messages
 | 
						|
consist of a collection of message headers, and a message body.  This module
 | 
						|
also defines a helper class :class:`AddressList` for parsing :rfc:`2822`
 | 
						|
addresses.  Please refer to the RFC for information on the specific syntax of
 | 
						|
:rfc:`2822` messages.
 | 
						|
 | 
						|
.. index:: module: mailbox
 | 
						|
 | 
						|
The :mod:`mailbox` module provides classes  to read mailboxes produced by
 | 
						|
various end-user mail programs.
 | 
						|
 | 
						|
 | 
						|
.. class:: Message(file[, seekable])
 | 
						|
 | 
						|
   A :class:`Message` instance is instantiated with an input object as parameter.
 | 
						|
   Message relies only on the input object having a :meth:`readline` method; in
 | 
						|
   particular, ordinary file objects qualify.  Instantiation reads headers from the
 | 
						|
   input object up to a delimiter line (normally a blank line) and stores them in
 | 
						|
   the instance.  The message body, following the headers, is not consumed.
 | 
						|
 | 
						|
   This class can work with any input object that supports a :meth:`readline`
 | 
						|
   method.  If the input object has seek and tell capability, the
 | 
						|
   :meth:`rewindbody` method will work; also, illegal lines will be pushed back
 | 
						|
   onto the input stream.  If the input object lacks seek but has an :meth:`unread`
 | 
						|
   method that can push back a line of input, :class:`Message` will use that to
 | 
						|
   push back illegal lines.  Thus this class can be used to parse messages coming
 | 
						|
   from a buffered stream.
 | 
						|
 | 
						|
   The optional *seekable* argument is provided as a workaround for certain stdio
 | 
						|
   libraries in which :cfunc:`tell` discards buffered data before discovering that
 | 
						|
   the :cfunc:`lseek` system call doesn't work.  For maximum portability, you
 | 
						|
   should set the seekable argument to zero to prevent that initial :meth:`tell`
 | 
						|
   when passing in an unseekable object such as a file object created from a socket
 | 
						|
   object.
 | 
						|
 | 
						|
   Input lines as read from the file may either be terminated by CR-LF or by a
 | 
						|
   single linefeed; a terminating CR-LF is replaced by a single linefeed before the
 | 
						|
   line is stored.
 | 
						|
 | 
						|
   All header matching is done independent of upper or lower case; e.g.
 | 
						|
   ``m['From']``, ``m['from']`` and ``m['FROM']`` all yield the same result.
 | 
						|
 | 
						|
 | 
						|
.. class:: AddressList(field)
 | 
						|
 | 
						|
   You may instantiate the :class:`AddressList` helper class using a single string
 | 
						|
   parameter, a comma-separated list of :rfc:`2822` addresses to be parsed.  (The
 | 
						|
   parameter ``None`` yields an empty list.)
 | 
						|
 | 
						|
 | 
						|
.. function:: quote(str)
 | 
						|
 | 
						|
   Return a new string with backslashes in *str* replaced by two backslashes and
 | 
						|
   double quotes replaced by backslash-double quote.
 | 
						|
 | 
						|
 | 
						|
.. function:: unquote(str)
 | 
						|
 | 
						|
   Return a new string which is an *unquoted* version of *str*. If *str* ends and
 | 
						|
   begins with double quotes, they are stripped off.  Likewise if *str* ends and
 | 
						|
   begins with angle brackets, they are stripped off.
 | 
						|
 | 
						|
 | 
						|
.. function:: parseaddr(address)
 | 
						|
 | 
						|
   Parse *address*, which should be the value of some address-containing field such
 | 
						|
   as :mailheader:`To` or :mailheader:`Cc`, into its constituent "realname" and
 | 
						|
   "email address" parts. Returns a tuple of that information, unless the parse
 | 
						|
   fails, in which case a 2-tuple ``(None, None)`` is returned.
 | 
						|
 | 
						|
 | 
						|
.. function:: dump_address_pair(pair)
 | 
						|
 | 
						|
   The inverse of :meth:`parseaddr`, this takes a 2-tuple of the form ``(realname,
 | 
						|
   email_address)`` and returns the string value suitable for a :mailheader:`To` or
 | 
						|
   :mailheader:`Cc` header.  If the first element of *pair* is false, then the
 | 
						|
   second element is returned unmodified.
 | 
						|
 | 
						|
 | 
						|
.. function:: parsedate(date)
 | 
						|
 | 
						|
   Attempts to parse a date according to the rules in :rfc:`2822`. however, some
 | 
						|
   mailers don't follow that format as specified, so :func:`parsedate` tries to
 | 
						|
   guess correctly in such cases.  *date* is a string containing an :rfc:`2822`
 | 
						|
   date, such as  ``'Mon, 20 Nov 1995 19:12:08 -0500'``.  If it succeeds in parsing
 | 
						|
   the date, :func:`parsedate` returns a 9-tuple that can be passed directly to
 | 
						|
   :func:`time.mktime`; otherwise ``None`` will be returned.  Note that indexes 6,
 | 
						|
   7, and 8 of the result tuple are not usable.
 | 
						|
 | 
						|
 | 
						|
.. function:: parsedate_tz(date)
 | 
						|
 | 
						|
   Performs the same function as :func:`parsedate`, but returns either ``None`` or
 | 
						|
   a 10-tuple; the first 9 elements make up a tuple that can be passed directly to
 | 
						|
   :func:`time.mktime`, and the tenth is the offset of the date's timezone from UTC
 | 
						|
   (which is the official term for Greenwich Mean Time).  (Note that the sign of
 | 
						|
   the timezone offset is the opposite of the sign of the ``time.timezone``
 | 
						|
   variable for the same timezone; the latter variable follows the POSIX standard
 | 
						|
   while this module follows :rfc:`2822`.)  If the input string has no timezone,
 | 
						|
   the last element of the tuple returned is ``None``.  Note that indexes 6, 7, and
 | 
						|
   8 of the result tuple are not usable.
 | 
						|
 | 
						|
 | 
						|
.. function:: mktime_tz(tuple)
 | 
						|
 | 
						|
   Turn a 10-tuple as returned by :func:`parsedate_tz` into a UTC timestamp.  If
 | 
						|
   the timezone item in the tuple is ``None``, assume local time.  Minor
 | 
						|
   deficiency: this first interprets the first 8 elements as a local time and then
 | 
						|
   compensates for the timezone difference; this may yield a slight error around
 | 
						|
   daylight savings time switch dates.  Not enough to worry about for common use.
 | 
						|
 | 
						|
 | 
						|
.. seealso::
 | 
						|
 | 
						|
   Module :mod:`email`
 | 
						|
      Comprehensive email handling package; supersedes the :mod:`rfc822` module.
 | 
						|
 | 
						|
   Module :mod:`mailbox`
 | 
						|
      Classes to read various mailbox formats produced  by end-user mail programs.
 | 
						|
 | 
						|
   Module :mod:`mimetools`
 | 
						|
      Subclass of :class:`rfc822.Message` that handles MIME encoded messages.
 | 
						|
 | 
						|
 | 
						|
.. _message-objects:
 | 
						|
 | 
						|
Message Objects
 | 
						|
---------------
 | 
						|
 | 
						|
A :class:`Message` instance has the following methods:
 | 
						|
 | 
						|
 | 
						|
.. method:: Message.rewindbody()
 | 
						|
 | 
						|
   Seek to the start of the message body.  This only works if the file object is
 | 
						|
   seekable.
 | 
						|
 | 
						|
 | 
						|
.. method:: Message.isheader(line)
 | 
						|
 | 
						|
   Returns a line's canonicalized fieldname (the dictionary key that will be used
 | 
						|
   to index it) if the line is a legal :rfc:`2822` header; otherwise returns
 | 
						|
   ``None`` (implying that parsing should stop here and the line be pushed back on
 | 
						|
   the input stream).  It is sometimes useful to override this method in a
 | 
						|
   subclass.
 | 
						|
 | 
						|
 | 
						|
.. method:: Message.islast(line)
 | 
						|
 | 
						|
   Return true if the given line is a delimiter on which Message should stop.  The
 | 
						|
   delimiter line is consumed, and the file object's read location positioned
 | 
						|
   immediately after it.  By default this method just checks that the line is
 | 
						|
   blank, but you can override it in a subclass.
 | 
						|
 | 
						|
 | 
						|
.. method:: Message.iscomment(line)
 | 
						|
 | 
						|
   Return ``True`` if the given line should be ignored entirely, just skipped. By
 | 
						|
   default this is a stub that always returns ``False``, but you can override it in
 | 
						|
   a subclass.
 | 
						|
 | 
						|
 | 
						|
.. method:: Message.getallmatchingheaders(name)
 | 
						|
 | 
						|
   Return a list of lines consisting of all headers matching *name*, if any.  Each
 | 
						|
   physical line, whether it is a continuation line or not, is a separate list
 | 
						|
   item.  Return the empty list if no header matches *name*.
 | 
						|
 | 
						|
 | 
						|
.. method:: Message.getfirstmatchingheader(name)
 | 
						|
 | 
						|
   Return a list of lines comprising the first header matching *name*, and its
 | 
						|
   continuation line(s), if any.  Return ``None`` if there is no header matching
 | 
						|
   *name*.
 | 
						|
 | 
						|
 | 
						|
.. method:: Message.getrawheader(name)
 | 
						|
 | 
						|
   Return a single string consisting of the text after the colon in the first
 | 
						|
   header matching *name*.  This includes leading whitespace, the trailing
 | 
						|
   linefeed, and internal linefeeds and whitespace if there any continuation
 | 
						|
   line(s) were present.  Return ``None`` if there is no header matching *name*.
 | 
						|
 | 
						|
 | 
						|
.. method:: Message.getheader(name[, default])
 | 
						|
 | 
						|
   Return a single string consisting of the last header matching *name*,
 | 
						|
   but strip leading and trailing whitespace.
 | 
						|
   Internal whitespace is not stripped.  The optional *default* argument can be
 | 
						|
   used to specify a different default to be returned when there is no header
 | 
						|
   matching *name*; it defaults to ``None``.
 | 
						|
   This is the preferred way to get parsed headers.
 | 
						|
 | 
						|
 | 
						|
.. method:: Message.get(name[, default])
 | 
						|
 | 
						|
   An alias for :meth:`getheader`, to make the interface more compatible  with
 | 
						|
   regular dictionaries.
 | 
						|
 | 
						|
 | 
						|
.. method:: Message.getaddr(name)
 | 
						|
 | 
						|
   Return a pair ``(full name, email address)`` parsed from the string returned by
 | 
						|
   ``getheader(name)``.  If no header matching *name* exists, return ``(None,
 | 
						|
   None)``; otherwise both the full name and the address are (possibly empty)
 | 
						|
   strings.
 | 
						|
 | 
						|
   Example: If *m*'s first :mailheader:`From` header contains the string
 | 
						|
   ``'jack@cwi.nl (Jack Jansen)'``, then ``m.getaddr('From')`` will yield the pair
 | 
						|
   ``('Jack Jansen', 'jack@cwi.nl')``. If the header contained ``'Jack Jansen
 | 
						|
   <jack@cwi.nl>'`` instead, it would yield the exact same result.
 | 
						|
 | 
						|
 | 
						|
.. method:: Message.getaddrlist(name)
 | 
						|
 | 
						|
   This is similar to ``getaddr(list)``, but parses a header containing a list of
 | 
						|
   email addresses (e.g. a :mailheader:`To` header) and returns a list of ``(full
 | 
						|
   name, email address)`` pairs (even if there was only one address in the header).
 | 
						|
   If there is no header matching *name*, return an empty list.
 | 
						|
 | 
						|
   If multiple headers exist that match the named header (e.g. if there are several
 | 
						|
   :mailheader:`Cc` headers), all are parsed for addresses. Any continuation lines
 | 
						|
   the named headers contain are also parsed.
 | 
						|
 | 
						|
 | 
						|
.. method:: Message.getdate(name)
 | 
						|
 | 
						|
   Retrieve a header using :meth:`getheader` and parse it into a 9-tuple compatible
 | 
						|
   with :func:`time.mktime`; note that fields 6, 7, and 8  are not usable.  If
 | 
						|
   there is no header matching *name*, or it is unparsable, return ``None``.
 | 
						|
 | 
						|
   Date parsing appears to be a black art, and not all mailers adhere to the
 | 
						|
   standard.  While it has been tested and found correct on a large collection of
 | 
						|
   email from many sources, it is still possible that this function may
 | 
						|
   occasionally yield an incorrect result.
 | 
						|
 | 
						|
 | 
						|
.. method:: Message.getdate_tz(name)
 | 
						|
 | 
						|
   Retrieve a header using :meth:`getheader` and parse it into a 10-tuple; the
 | 
						|
   first 9 elements will make a tuple compatible with :func:`time.mktime`, and the
 | 
						|
   10th is a number giving the offset of the date's timezone from UTC.  Note that
 | 
						|
   fields 6, 7, and 8  are not usable.  Similarly to :meth:`getdate`, if there is
 | 
						|
   no header matching *name*, or it is unparsable, return ``None``.
 | 
						|
 | 
						|
:class:`Message` instances also support a limited mapping interface. In
 | 
						|
particular: ``m[name]`` is like ``m.getheader(name)`` but raises :exc:`KeyError`
 | 
						|
if there is no matching header; and ``len(m)``, ``m.get(name[, default])``,
 | 
						|
``m.__contains__(name)``, ``m.keys()``, ``m.values()`` ``m.items()``, and
 | 
						|
``m.setdefault(name[, default])`` act as expected, with the one difference
 | 
						|
that :meth:`setdefault` uses an empty string as the default value.
 | 
						|
:class:`Message` instances also support the mapping writable interface ``m[name]
 | 
						|
= value`` and ``del m[name]``.  :class:`Message` objects do not support the
 | 
						|
:meth:`clear`, :meth:`copy`, :meth:`popitem`, or :meth:`update` methods of the
 | 
						|
mapping interface.  (Support for :meth:`get` and :meth:`setdefault` was only
 | 
						|
added in Python 2.2.)
 | 
						|
 | 
						|
Finally, :class:`Message` instances have some public instance variables:
 | 
						|
 | 
						|
 | 
						|
.. attribute:: Message.headers
 | 
						|
 | 
						|
   A list containing the entire set of header lines, in the order in which they
 | 
						|
   were read (except that setitem calls may disturb this order). Each line contains
 | 
						|
   a trailing newline.  The blank line terminating the headers is not contained in
 | 
						|
   the list.
 | 
						|
 | 
						|
 | 
						|
.. attribute:: Message.fp
 | 
						|
 | 
						|
   The file or file-like object passed at instantiation time.  This can be used to
 | 
						|
   read the message content.
 | 
						|
 | 
						|
 | 
						|
.. attribute:: Message.unixfrom
 | 
						|
 | 
						|
   The Unix ``From`` line, if the message had one, or an empty string.  This is
 | 
						|
   needed to regenerate the message in some contexts, such as an ``mbox``\ -style
 | 
						|
   mailbox file.
 | 
						|
 | 
						|
 | 
						|
.. _addresslist-objects:
 | 
						|
 | 
						|
AddressList Objects
 | 
						|
-------------------
 | 
						|
 | 
						|
An :class:`AddressList` instance has the following methods:
 | 
						|
 | 
						|
 | 
						|
.. method:: AddressList.__len__()
 | 
						|
 | 
						|
   Return the number of addresses in the address list.
 | 
						|
 | 
						|
 | 
						|
.. method:: AddressList.__str__()
 | 
						|
 | 
						|
   Return a canonicalized string representation of the address list. Addresses are
 | 
						|
   rendered in "name" <host@domain> form, comma-separated.
 | 
						|
 | 
						|
 | 
						|
.. method:: AddressList.__add__(alist)
 | 
						|
 | 
						|
   Return a new :class:`AddressList` instance that contains all addresses in both
 | 
						|
   :class:`AddressList` operands, with duplicates removed (set union).
 | 
						|
 | 
						|
 | 
						|
.. method:: AddressList.__iadd__(alist)
 | 
						|
 | 
						|
   In-place version of :meth:`__add__`; turns this :class:`AddressList` instance
 | 
						|
   into the union of itself and the right-hand instance, *alist*.
 | 
						|
 | 
						|
 | 
						|
.. method:: AddressList.__sub__(alist)
 | 
						|
 | 
						|
   Return a new :class:`AddressList` instance that contains every address in the
 | 
						|
   left-hand :class:`AddressList` operand that is not present in the right-hand
 | 
						|
   address operand (set difference).
 | 
						|
 | 
						|
 | 
						|
.. method:: AddressList.__isub__(alist)
 | 
						|
 | 
						|
   In-place version of :meth:`__sub__`, removing addresses in this list which are
 | 
						|
   also in *alist*.
 | 
						|
 | 
						|
Finally, :class:`AddressList` instances have one public instance variable:
 | 
						|
 | 
						|
 | 
						|
.. attribute:: AddressList.addresslist
 | 
						|
 | 
						|
   A list of tuple string pairs, one per address.  In each member, the first is the
 | 
						|
   canonicalized name part, the second is the actual route-address (``'@'``\
 | 
						|
   -separated username-host.domain pair).
 | 
						|
 | 
						|
.. rubric:: Footnotes
 | 
						|
 | 
						|
.. [#] This module originally conformed to :rfc:`822`, hence the name.  Since then,
 | 
						|
   :rfc:`2822` has been released as an update to :rfc:`822`.  This module should be
 | 
						|
   considered :rfc:`2822`\ -conformant, especially in cases where the syntax or
 | 
						|
   semantics have changed since :rfc:`822`.
 | 
						|
 |