mirror of
				https://github.com/python/cpython.git
				synced 2025-10-26 08:19:20 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			320 lines
		
	
	
	
		
			14 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			320 lines
		
	
	
	
		
			14 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| :mod:`!email.parser`: Parsing email messages
 | |
| --------------------------------------------
 | |
| 
 | |
| .. module:: email.parser
 | |
|    :synopsis: Parse flat text email messages to produce a message object structure.
 | |
| 
 | |
| **Source code:** :source:`Lib/email/parser.py`
 | |
| 
 | |
| --------------
 | |
| 
 | |
| Message object structures can be created in one of two ways: they can be
 | |
| created from whole cloth by creating an :class:`~email.message.EmailMessage`
 | |
| object, adding headers using the dictionary interface, and adding payload(s)
 | |
| using :meth:`~email.message.EmailMessage.set_content` and related methods, or
 | |
| they can be created by parsing a serialized representation of the email
 | |
| message.
 | |
| 
 | |
| The :mod:`email` package provides a standard parser that understands most email
 | |
| document structures, including MIME documents.  You can pass the parser a
 | |
| bytes, string or file object, and the parser will return to you the root
 | |
| :class:`~email.message.EmailMessage` instance of the object structure.  For
 | |
| simple, non-MIME messages the payload of this root object will likely be a
 | |
| string containing the text of the message.  For MIME messages, the root object
 | |
| will return ``True`` from its :meth:`~email.message.EmailMessage.is_multipart`
 | |
| method, and the subparts can be accessed via the payload manipulation methods,
 | |
| such as :meth:`~email.message.EmailMessage.get_body`,
 | |
| :meth:`~email.message.EmailMessage.iter_parts`, and
 | |
| :meth:`~email.message.EmailMessage.walk`.
 | |
| 
 | |
| There are actually two parser interfaces available for use, the :class:`Parser`
 | |
| API and the incremental :class:`FeedParser` API.  The :class:`Parser` API is
 | |
| most useful if you have the entire text of the message in memory, or if the
 | |
| entire message lives in a file on the file system.  :class:`FeedParser` is more
 | |
| appropriate when you are reading the message from a stream which might block
 | |
| waiting for more input (such as reading an email message from a socket).  The
 | |
| :class:`FeedParser` can consume and parse the message incrementally, and only
 | |
| returns the root object when you close the parser.
 | |
| 
 | |
| Note that the parser can be extended in limited ways, and of course you can
 | |
| implement your own parser completely from scratch.  All of the logic that
 | |
| connects the :mod:`email` package's bundled parser and the
 | |
| :class:`~email.message.EmailMessage` class is embodied in the :class:`~email.policy.Policy`
 | |
| class, so a custom parser can create message object trees any way it finds
 | |
| necessary by implementing custom versions of the appropriate :class:`!Policy`
 | |
| methods.
 | |
| 
 | |
| 
 | |
| FeedParser API
 | |
| ^^^^^^^^^^^^^^
 | |
| 
 | |
| The :class:`BytesFeedParser`, imported from the :mod:`email.feedparser` module,
 | |
| provides an API that is conducive to incremental parsing of email messages,
 | |
| such as would be necessary when reading the text of an email message from a
 | |
| source that can block (such as a socket).  The :class:`BytesFeedParser` can of
 | |
| course be used to parse an email message fully contained in a :term:`bytes-like
 | |
| object`, string, or file, but the :class:`BytesParser` API may be more
 | |
| convenient for such use cases.  The semantics and results of the two parser
 | |
| APIs are identical.
 | |
| 
 | |
| The :class:`BytesFeedParser`'s API is simple; you create an instance, feed it a
 | |
| bunch of bytes until there's no more to feed it, then close the parser to
 | |
| retrieve the root message object.  The :class:`BytesFeedParser` is extremely
 | |
| accurate when parsing standards-compliant messages, and it does a very good job
 | |
| of parsing non-compliant messages, providing information about how a message
 | |
| was deemed broken.  It will populate a message object's
 | |
| :attr:`~email.message.EmailMessage.defects` attribute with a list of any
 | |
| problems it found in a message.  See the :mod:`email.errors` module for the
 | |
| list of defects that it can find.
 | |
| 
 | |
| Here is the API for the :class:`BytesFeedParser`:
 | |
| 
 | |
| 
 | |
| .. class:: BytesFeedParser(_factory=None, *, policy=policy.compat32)
 | |
| 
 | |
|    Create a :class:`BytesFeedParser` instance.  Optional *_factory* is a
 | |
|    no-argument callable; if not specified use the
 | |
|    :attr:`~email.policy.Policy.message_factory` from the *policy*.  Call
 | |
|    *_factory* whenever a new message object is needed.
 | |
| 
 | |
|    If *policy* is specified use the rules it specifies to update the
 | |
|    representation of the message.  If *policy* is not set, use the
 | |
|    :class:`compat32 <email.policy.Compat32>` policy, which maintains backward
 | |
|    compatibility with the Python 3.2 version of the email package and provides
 | |
|    :class:`~email.message.Message` as the default factory.  All other policies
 | |
|    provide :class:`~email.message.EmailMessage` as the default *_factory*. For
 | |
|    more information on what else *policy* controls, see the
 | |
|    :mod:`~email.policy` documentation.
 | |
| 
 | |
|    Note: **The policy keyword should always be specified**; The default will
 | |
|    change to :data:`email.policy.default` in a future version of Python.
 | |
| 
 | |
|    .. versionadded:: 3.2
 | |
| 
 | |
|    .. versionchanged:: 3.3 Added the *policy* keyword.
 | |
|    .. versionchanged:: 3.6 *_factory* defaults to the policy ``message_factory``.
 | |
| 
 | |
| 
 | |
|    .. method:: feed(data)
 | |
| 
 | |
|       Feed the parser some more data.  *data* should be a :term:`bytes-like
 | |
|       object` containing one or more lines.  The lines can be partial and the
 | |
|       parser will stitch such partial lines together properly.  The lines can
 | |
|       have any of the three common line endings: carriage return, newline, or
 | |
|       carriage return and newline (they can even be mixed).
 | |
| 
 | |
| 
 | |
|    .. method:: close()
 | |
| 
 | |
|       Complete the parsing of all previously fed data and return the root
 | |
|       message object.  It is undefined what happens if :meth:`~feed` is called
 | |
|       after this method has been called.
 | |
| 
 | |
| 
 | |
| .. class:: FeedParser(_factory=None, *, policy=policy.compat32)
 | |
| 
 | |
|    Works like :class:`BytesFeedParser` except that the input to the
 | |
|    :meth:`~BytesFeedParser.feed` method must be a string.  This is of limited
 | |
|    utility, since the only way for such a message to be valid is for it to
 | |
|    contain only ASCII text or, if :attr:`~email.policy.Policy.utf8` is
 | |
|    ``True``, no binary attachments.
 | |
| 
 | |
|    .. versionchanged:: 3.3 Added the *policy* keyword.
 | |
| 
 | |
| 
 | |
| Parser API
 | |
| ^^^^^^^^^^
 | |
| 
 | |
| The :class:`BytesParser` class, imported from the :mod:`email.parser` module,
 | |
| provides an API that can be used to parse a message when the complete contents
 | |
| of the message are available in a :term:`bytes-like object` or file.  The
 | |
| :mod:`email.parser` module also provides :class:`Parser` for parsing strings,
 | |
| and header-only parsers, :class:`BytesHeaderParser` and
 | |
| :class:`HeaderParser`, which can be used if you're only interested in the
 | |
| headers of the message.  :class:`BytesHeaderParser` and :class:`HeaderParser`
 | |
| can be much faster in these situations, since they do not attempt to parse the
 | |
| message body, instead setting the payload to the raw body.
 | |
| 
 | |
| 
 | |
| .. class:: BytesParser(_class=None, *, policy=policy.compat32)
 | |
| 
 | |
|    Create a :class:`BytesParser` instance.  The *_class* and *policy*
 | |
|    arguments have the same meaning and semantics as the *_factory*
 | |
|    and *policy* arguments of :class:`BytesFeedParser`.
 | |
| 
 | |
|    Note: **The policy keyword should always be specified**; The default will
 | |
|    change to :data:`email.policy.default` in a future version of Python.
 | |
| 
 | |
|    .. versionchanged:: 3.3
 | |
|       Removed the *strict* argument that was deprecated in 2.4.  Added the
 | |
|       *policy* keyword.
 | |
|    .. versionchanged:: 3.6 *_class* defaults to the policy ``message_factory``.
 | |
| 
 | |
| 
 | |
|    .. method:: parse(fp, headersonly=False)
 | |
| 
 | |
|       Read all the data from the binary file-like object *fp*, parse the
 | |
|       resulting bytes, and return the message object.  *fp* must support
 | |
|       both the :meth:`~io.IOBase.readline` and the :meth:`~io.IOBase.read`
 | |
|       methods.
 | |
| 
 | |
|       The bytes contained in *fp* must be formatted as a block of :rfc:`5322`
 | |
|       (or, if :attr:`~email.policy.Policy.utf8` is ``True``, :rfc:`6532`)
 | |
|       style headers and header continuation lines, optionally preceded by an
 | |
|       envelope header.  The header block is terminated either by the end of the
 | |
|       data or by a blank line.  Following the header block is the body of the
 | |
|       message (which may contain MIME-encoded subparts, including subparts
 | |
|       with a :mailheader:`Content-Transfer-Encoding` of ``8bit``).
 | |
| 
 | |
|       Optional *headersonly* is a flag specifying whether to stop parsing after
 | |
|       reading the headers or not.  The default is ``False``, meaning it parses
 | |
|       the entire contents of the file.
 | |
| 
 | |
| 
 | |
|    .. method:: parsebytes(bytes, headersonly=False)
 | |
| 
 | |
|       Similar to the :meth:`parse` method, except it takes a :term:`bytes-like
 | |
|       object` instead of a file-like object.  Calling this method on a
 | |
|       :term:`bytes-like object` is equivalent to wrapping *bytes* in a
 | |
|       :class:`~io.BytesIO` instance first and calling :meth:`parse`.
 | |
| 
 | |
|       Optional *headersonly* is as with the :meth:`parse` method.
 | |
| 
 | |
|    .. versionadded:: 3.2
 | |
| 
 | |
| 
 | |
| .. class:: BytesHeaderParser(_class=None, *, policy=policy.compat32)
 | |
| 
 | |
|    Exactly like :class:`BytesParser`, except that *headersonly*
 | |
|    defaults to ``True``.
 | |
| 
 | |
|    .. versionadded:: 3.3
 | |
| 
 | |
| 
 | |
| .. class:: Parser(_class=None, *, policy=policy.compat32)
 | |
| 
 | |
|    This class is parallel to :class:`BytesParser`, but handles string input.
 | |
| 
 | |
|    .. versionchanged:: 3.3
 | |
|       Removed the *strict* argument.  Added the *policy* keyword.
 | |
|    .. versionchanged:: 3.6 *_class* defaults to the policy ``message_factory``.
 | |
| 
 | |
| 
 | |
|    .. method:: parse(fp, headersonly=False)
 | |
| 
 | |
|       Read all the data from the text-mode file-like object *fp*, parse the
 | |
|       resulting text, and return the root message object.  *fp* must support
 | |
|       both the :meth:`~io.TextIOBase.readline` and the
 | |
|       :meth:`~io.TextIOBase.read` methods on file-like objects.
 | |
| 
 | |
|       Other than the text mode requirement, this method operates like
 | |
|       :meth:`BytesParser.parse`.
 | |
| 
 | |
| 
 | |
|    .. method:: parsestr(text, headersonly=False)
 | |
| 
 | |
|       Similar to the :meth:`parse` method, except it takes a string object
 | |
|       instead of a file-like object.  Calling this method on a string is
 | |
|       equivalent to wrapping *text* in a :class:`~io.StringIO` instance first
 | |
|       and calling :meth:`parse`.
 | |
| 
 | |
|       Optional *headersonly* is as with the :meth:`parse` method.
 | |
| 
 | |
| 
 | |
| .. class:: HeaderParser(_class=None, *, policy=policy.compat32)
 | |
| 
 | |
|    Exactly like :class:`Parser`, except that *headersonly*
 | |
|    defaults to ``True``.
 | |
| 
 | |
| 
 | |
| Since creating a message object structure from a string or a file object is such
 | |
| a common task, four functions are provided as a convenience.  They are available
 | |
| in the top-level :mod:`email` package namespace.
 | |
| 
 | |
| .. currentmodule:: email
 | |
| 
 | |
| 
 | |
| .. function:: message_from_bytes(s, _class=None, *, policy=policy.compat32)
 | |
| 
 | |
|    Return a message object structure from a :term:`bytes-like object`.  This is
 | |
|    equivalent to ``BytesParser().parsebytes(s)``.  Optional *_class* and
 | |
|    *policy* are interpreted as with the :class:`~email.parser.BytesParser` class
 | |
|    constructor.
 | |
| 
 | |
|    .. versionadded:: 3.2
 | |
|    .. versionchanged:: 3.3
 | |
|       Removed the *strict* argument.  Added the *policy* keyword.
 | |
| 
 | |
| 
 | |
| .. function:: message_from_binary_file(fp, _class=None, *, \
 | |
|                                        policy=policy.compat32)
 | |
| 
 | |
|    Return a message object structure tree from an open binary :term:`file
 | |
|    object`.  This is equivalent to ``BytesParser().parse(fp)``.  *_class* and
 | |
|    *policy* are interpreted as with the :class:`~email.parser.BytesParser` class
 | |
|    constructor.
 | |
| 
 | |
|    .. versionadded:: 3.2
 | |
|    .. versionchanged:: 3.3
 | |
|       Removed the *strict* argument.  Added the *policy* keyword.
 | |
| 
 | |
| 
 | |
| .. function:: message_from_string(s, _class=None, *, policy=policy.compat32)
 | |
| 
 | |
|    Return a message object structure from a string.  This is equivalent to
 | |
|    ``Parser().parsestr(s)``.  *_class* and *policy* are interpreted as
 | |
|    with the :class:`~email.parser.Parser` class constructor.
 | |
| 
 | |
|    .. versionchanged:: 3.3
 | |
|       Removed the *strict* argument.  Added the *policy* keyword.
 | |
| 
 | |
| 
 | |
| .. function:: message_from_file(fp, _class=None, *, policy=policy.compat32)
 | |
| 
 | |
|    Return a message object structure tree from an open :term:`file object`.
 | |
|    This is equivalent to ``Parser().parse(fp)``.  *_class* and *policy* are
 | |
|    interpreted as with the :class:`~email.parser.Parser` class constructor.
 | |
| 
 | |
|    .. versionchanged:: 3.3
 | |
|       Removed the *strict* argument.  Added the *policy* keyword.
 | |
|    .. versionchanged:: 3.6 *_class* defaults to the policy ``message_factory``.
 | |
| 
 | |
| 
 | |
| Here's an example of how you might use :func:`message_from_bytes` at an
 | |
| interactive Python prompt::
 | |
| 
 | |
|    >>> import email
 | |
|    >>> msg = email.message_from_bytes(myBytes)  # doctest: +SKIP
 | |
| 
 | |
| 
 | |
| Additional notes
 | |
| ^^^^^^^^^^^^^^^^
 | |
| 
 | |
| Here are some notes on the parsing semantics:
 | |
| 
 | |
| * Most non-\ :mimetype:`multipart` type messages are parsed as a single message
 | |
|   object with a string payload.  These objects will return ``False`` for
 | |
|   :meth:`~email.message.EmailMessage.is_multipart`, and
 | |
|   :meth:`~email.message.EmailMessage.iter_parts` will yield an empty list.
 | |
| 
 | |
| * All :mimetype:`multipart` type messages will be parsed as a container message
 | |
|   object with a list of sub-message objects for their payload.  The outer
 | |
|   container message will return ``True`` for
 | |
|   :meth:`~email.message.EmailMessage.is_multipart`, and
 | |
|   :meth:`~email.message.EmailMessage.iter_parts` will yield a list of subparts.
 | |
| 
 | |
| * Most messages with a content type of :mimetype:`message/\*` (such as
 | |
|   :mimetype:`message/delivery-status` and :mimetype:`message/rfc822`) will also
 | |
|   be parsed as container object containing a list payload of length 1.  Their
 | |
|   :meth:`~email.message.EmailMessage.is_multipart` method will return ``True``.
 | |
|   The single element yielded by :meth:`~email.message.EmailMessage.iter_parts`
 | |
|   will be a sub-message object.
 | |
| 
 | |
| * Some non-standards-compliant messages may not be internally consistent about
 | |
|   their :mimetype:`multipart`\ -edness.  Such messages may have a
 | |
|   :mailheader:`Content-Type` header of type :mimetype:`multipart`, but their
 | |
|   :meth:`~email.message.EmailMessage.is_multipart` method may return ``False``.
 | |
|   If such messages were parsed with the :class:`~email.parser.FeedParser`,
 | |
|   they will have an instance of the
 | |
|   :class:`~email.errors.MultipartInvariantViolationDefect` class in their
 | |
|   *defects* attribute list.  See :mod:`email.errors` for details.
 | 
