mirror of
				https://github.com/python/cpython.git
				synced 2025-11-03 11:23:31 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			463 lines
		
	
	
	
		
			16 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			463 lines
		
	
	
	
		
			16 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
:mod:`xml.sax.handler` --- Base classes for SAX handlers
 | 
						|
========================================================
 | 
						|
 | 
						|
.. module:: xml.sax.handler
 | 
						|
   :synopsis: Base classes for SAX event handlers.
 | 
						|
 | 
						|
.. moduleauthor:: Lars Marius Garshol <larsga@garshol.priv.no>
 | 
						|
.. sectionauthor:: Martin v. Löwis <martin@v.loewis.de>
 | 
						|
 | 
						|
**Source code:** :source:`Lib/xml/sax/handler.py`
 | 
						|
 | 
						|
--------------
 | 
						|
 | 
						|
The SAX API defines five kinds of handlers: content handlers, DTD handlers,
 | 
						|
error handlers, entity resolvers and lexical handlers. Applications normally
 | 
						|
only need to implement those interfaces whose events they are interested in;
 | 
						|
they can implement the interfaces in a single object or in multiple objects.
 | 
						|
Handler implementations should inherit from the base classes provided in the
 | 
						|
module :mod:`xml.sax.handler`, so that all methods get default implementations.
 | 
						|
 | 
						|
 | 
						|
.. class:: ContentHandler
 | 
						|
 | 
						|
   This is the main callback interface in SAX, and the one most important to
 | 
						|
   applications. The order of events in this interface mirrors the order of the
 | 
						|
   information in the document.
 | 
						|
 | 
						|
 | 
						|
.. class:: DTDHandler
 | 
						|
 | 
						|
   Handle DTD events.
 | 
						|
 | 
						|
   This interface specifies only those DTD events required for basic parsing
 | 
						|
   (unparsed entities and attributes).
 | 
						|
 | 
						|
 | 
						|
.. class:: EntityResolver
 | 
						|
 | 
						|
   Basic interface for resolving entities. If you create an object implementing
 | 
						|
   this interface, then register the object with your Parser, the parser will call
 | 
						|
   the method in your object to resolve all external entities.
 | 
						|
 | 
						|
 | 
						|
.. class:: ErrorHandler
 | 
						|
 | 
						|
   Interface used by the parser to present error and warning messages to the
 | 
						|
   application.  The methods of this object control whether errors are immediately
 | 
						|
   converted to exceptions or are handled in some other way.
 | 
						|
 | 
						|
 | 
						|
.. class:: LexicalHandler
 | 
						|
 | 
						|
   Interface used by the parser to represent low frequency events which may not
 | 
						|
   be of interest to many applications.
 | 
						|
 | 
						|
In addition to these classes, :mod:`xml.sax.handler` provides symbolic constants
 | 
						|
for the feature and property names.
 | 
						|
 | 
						|
 | 
						|
.. data:: feature_namespaces
 | 
						|
 | 
						|
   | value: ``"http://xml.org/sax/features/namespaces"``
 | 
						|
   | true: Perform Namespace processing.
 | 
						|
   | false: Optionally do not perform Namespace processing (implies
 | 
						|
     namespace-prefixes; default).
 | 
						|
   | access: (parsing) read-only; (not parsing) read/write
 | 
						|
 | 
						|
 | 
						|
.. data:: feature_namespace_prefixes
 | 
						|
 | 
						|
   | value: ``"http://xml.org/sax/features/namespace-prefixes"``
 | 
						|
   | true: Report the original prefixed names and attributes used for Namespace
 | 
						|
     declarations.
 | 
						|
   | false: Do not report attributes used for Namespace declarations, and
 | 
						|
     optionally do not report original prefixed names (default).
 | 
						|
   | access: (parsing) read-only; (not parsing) read/write
 | 
						|
 | 
						|
 | 
						|
.. data:: feature_string_interning
 | 
						|
 | 
						|
   | value: ``"http://xml.org/sax/features/string-interning"``
 | 
						|
   | true: All element names, prefixes, attribute names, Namespace URIs, and
 | 
						|
     local names are interned using the built-in intern function.
 | 
						|
   | false: Names are not necessarily interned, although they may be (default).
 | 
						|
   | access: (parsing) read-only; (not parsing) read/write
 | 
						|
 | 
						|
 | 
						|
.. data:: feature_validation
 | 
						|
 | 
						|
   | value: ``"http://xml.org/sax/features/validation"``
 | 
						|
   | true: Report all validation errors (implies external-general-entities and
 | 
						|
     external-parameter-entities).
 | 
						|
   | false: Do not report validation errors.
 | 
						|
   | access: (parsing) read-only; (not parsing) read/write
 | 
						|
 | 
						|
 | 
						|
.. data:: feature_external_ges
 | 
						|
 | 
						|
   | value: ``"http://xml.org/sax/features/external-general-entities"``
 | 
						|
   | true: Include all external general (text) entities.
 | 
						|
   | false: Do not include external general entities.
 | 
						|
   | access: (parsing) read-only; (not parsing) read/write
 | 
						|
 | 
						|
 | 
						|
.. data:: feature_external_pes
 | 
						|
 | 
						|
   | value: ``"http://xml.org/sax/features/external-parameter-entities"``
 | 
						|
   | true: Include all external parameter entities, including the external DTD
 | 
						|
     subset.
 | 
						|
   | false: Do not include any external parameter entities, even the external
 | 
						|
     DTD subset.
 | 
						|
   | access: (parsing) read-only; (not parsing) read/write
 | 
						|
 | 
						|
 | 
						|
.. data:: all_features
 | 
						|
 | 
						|
   List of all features.
 | 
						|
 | 
						|
 | 
						|
.. data:: property_lexical_handler
 | 
						|
 | 
						|
   | value: ``"http://xml.org/sax/properties/lexical-handler"``
 | 
						|
   | data type: xml.sax.handler.LexicalHandler (not supported in Python 2)
 | 
						|
   | description: An optional extension handler for lexical events like
 | 
						|
     comments.
 | 
						|
   | access: read/write
 | 
						|
 | 
						|
 | 
						|
.. data:: property_declaration_handler
 | 
						|
 | 
						|
   | value: ``"http://xml.org/sax/properties/declaration-handler"``
 | 
						|
   | data type: xml.sax.sax2lib.DeclHandler (not supported in Python 2)
 | 
						|
   | description: An optional extension handler for DTD-related events other
 | 
						|
     than notations and unparsed entities.
 | 
						|
   | access: read/write
 | 
						|
 | 
						|
 | 
						|
.. data:: property_dom_node
 | 
						|
 | 
						|
   | value: ``"http://xml.org/sax/properties/dom-node"``
 | 
						|
   | data type: org.w3c.dom.Node (not supported in Python 2)
 | 
						|
   | description: When parsing, the current DOM node being visited if this is
 | 
						|
     a DOM iterator; when not parsing, the root DOM node for iteration.
 | 
						|
   | access: (parsing) read-only; (not parsing) read/write
 | 
						|
 | 
						|
 | 
						|
.. data:: property_xml_string
 | 
						|
 | 
						|
   | value: ``"http://xml.org/sax/properties/xml-string"``
 | 
						|
   | data type: Bytes
 | 
						|
   | description: The literal string of characters that was the source for the
 | 
						|
     current event.
 | 
						|
   | access: read-only
 | 
						|
 | 
						|
 | 
						|
.. data:: all_properties
 | 
						|
 | 
						|
   List of all known property names.
 | 
						|
 | 
						|
 | 
						|
.. _content-handler-objects:
 | 
						|
 | 
						|
ContentHandler Objects
 | 
						|
----------------------
 | 
						|
 | 
						|
Users are expected to subclass :class:`ContentHandler` to support their
 | 
						|
application.  The following methods are called by the parser on the appropriate
 | 
						|
events in the input document:
 | 
						|
 | 
						|
 | 
						|
.. method:: ContentHandler.setDocumentLocator(locator)
 | 
						|
 | 
						|
   Called by the parser to give the application a locator for locating the origin
 | 
						|
   of document events.
 | 
						|
 | 
						|
   SAX parsers are strongly encouraged (though not absolutely required) to supply a
 | 
						|
   locator: if it does so, it must supply the locator to the application by
 | 
						|
   invoking this method before invoking any of the other methods in the
 | 
						|
   DocumentHandler interface.
 | 
						|
 | 
						|
   The locator allows the application to determine the end position of any
 | 
						|
   document-related event, even if the parser is not reporting an error. Typically,
 | 
						|
   the application will use this information for reporting its own errors (such as
 | 
						|
   character content that does not match an application's business rules). The
 | 
						|
   information returned by the locator is probably not sufficient for use with a
 | 
						|
   search engine.
 | 
						|
 | 
						|
   Note that the locator will return correct information only during the invocation
 | 
						|
   of the events in this interface. The application should not attempt to use it at
 | 
						|
   any other time.
 | 
						|
 | 
						|
 | 
						|
.. method:: ContentHandler.startDocument()
 | 
						|
 | 
						|
   Receive notification of the beginning of a document.
 | 
						|
 | 
						|
   The SAX parser will invoke this method only once, before any other methods in
 | 
						|
   this interface or in DTDHandler (except for :meth:`setDocumentLocator`).
 | 
						|
 | 
						|
 | 
						|
.. method:: ContentHandler.endDocument()
 | 
						|
 | 
						|
   Receive notification of the end of a document.
 | 
						|
 | 
						|
   The SAX parser will invoke this method only once, and it will be the last method
 | 
						|
   invoked during the parse. The parser shall not invoke this method until it has
 | 
						|
   either abandoned parsing (because of an unrecoverable error) or reached the end
 | 
						|
   of input.
 | 
						|
 | 
						|
 | 
						|
.. method:: ContentHandler.startPrefixMapping(prefix, uri)
 | 
						|
 | 
						|
   Begin the scope of a prefix-URI Namespace mapping.
 | 
						|
 | 
						|
   The information from this event is not necessary for normal Namespace
 | 
						|
   processing: the SAX XML reader will automatically replace prefixes for element
 | 
						|
   and attribute names when the ``feature_namespaces`` feature is enabled (the
 | 
						|
   default).
 | 
						|
 | 
						|
   There are cases, however, when applications need to use prefixes in character
 | 
						|
   data or in attribute values, where they cannot safely be expanded automatically;
 | 
						|
   the :meth:`startPrefixMapping` and :meth:`endPrefixMapping` events supply the
 | 
						|
   information to the application to expand prefixes in those contexts itself, if
 | 
						|
   necessary.
 | 
						|
 | 
						|
   .. XXX This is not really the default, is it? MvL
 | 
						|
 | 
						|
   Note that :meth:`startPrefixMapping` and :meth:`endPrefixMapping` events are not
 | 
						|
   guaranteed to be properly nested relative to each-other: all
 | 
						|
   :meth:`startPrefixMapping` events will occur before the corresponding
 | 
						|
   :meth:`startElement` event, and all :meth:`endPrefixMapping` events will occur
 | 
						|
   after the corresponding :meth:`endElement` event, but their order is not
 | 
						|
   guaranteed.
 | 
						|
 | 
						|
 | 
						|
.. method:: ContentHandler.endPrefixMapping(prefix)
 | 
						|
 | 
						|
   End the scope of a prefix-URI mapping.
 | 
						|
 | 
						|
   See :meth:`startPrefixMapping` for details. This event will always occur after
 | 
						|
   the corresponding :meth:`endElement` event, but the order of
 | 
						|
   :meth:`endPrefixMapping` events is not otherwise guaranteed.
 | 
						|
 | 
						|
 | 
						|
.. method:: ContentHandler.startElement(name, attrs)
 | 
						|
 | 
						|
   Signals the start of an element in non-namespace mode.
 | 
						|
 | 
						|
   The *name* parameter contains the raw XML 1.0 name of the element type as a
 | 
						|
   string and the *attrs* parameter holds an object of the
 | 
						|
   :class:`~xml.sax.xmlreader.Attributes`
 | 
						|
   interface (see :ref:`attributes-objects`) containing the attributes of
 | 
						|
   the element.  The object passed as *attrs* may be re-used by the parser; holding
 | 
						|
   on to a reference to it is not a reliable way to keep a copy of the attributes.
 | 
						|
   To keep a copy of the attributes, use the :meth:`copy` method of the *attrs*
 | 
						|
   object.
 | 
						|
 | 
						|
 | 
						|
.. method:: ContentHandler.endElement(name)
 | 
						|
 | 
						|
   Signals the end of an element in non-namespace mode.
 | 
						|
 | 
						|
   The *name* parameter contains the name of the element type, just as with the
 | 
						|
   :meth:`startElement` event.
 | 
						|
 | 
						|
 | 
						|
.. method:: ContentHandler.startElementNS(name, qname, attrs)
 | 
						|
 | 
						|
   Signals the start of an element in namespace mode.
 | 
						|
 | 
						|
   The *name* parameter contains the name of the element type as a ``(uri,
 | 
						|
   localname)`` tuple, the *qname* parameter contains the raw XML 1.0 name used in
 | 
						|
   the source document, and the *attrs* parameter holds an instance of the
 | 
						|
   :class:`~xml.sax.xmlreader.AttributesNS` interface (see
 | 
						|
   :ref:`attributes-ns-objects`)
 | 
						|
   containing the attributes of the element.  If no namespace is associated with
 | 
						|
   the element, the *uri* component of *name* will be ``None``.  The object passed
 | 
						|
   as *attrs* may be re-used by the parser; holding on to a reference to it is not
 | 
						|
   a reliable way to keep a copy of the attributes.  To keep a copy of the
 | 
						|
   attributes, use the :meth:`copy` method of the *attrs* object.
 | 
						|
 | 
						|
   Parsers may set the *qname* parameter to ``None``, unless the
 | 
						|
   ``feature_namespace_prefixes`` feature is activated.
 | 
						|
 | 
						|
 | 
						|
.. method:: ContentHandler.endElementNS(name, qname)
 | 
						|
 | 
						|
   Signals the end of an element in namespace mode.
 | 
						|
 | 
						|
   The *name* parameter contains the name of the element type, just as with the
 | 
						|
   :meth:`startElementNS` method, likewise the *qname* parameter.
 | 
						|
 | 
						|
 | 
						|
.. method:: ContentHandler.characters(content)
 | 
						|
 | 
						|
   Receive notification of character data.
 | 
						|
 | 
						|
   The Parser will call this method to report each chunk of character data. SAX
 | 
						|
   parsers may return all contiguous character data in a single chunk, or they may
 | 
						|
   split it into several chunks; however, all of the characters in any single event
 | 
						|
   must come from the same external entity so that the Locator provides useful
 | 
						|
   information.
 | 
						|
 | 
						|
   *content* may be a string or bytes instance; the ``expat`` reader module
 | 
						|
   always produces strings.
 | 
						|
 | 
						|
   .. note::
 | 
						|
 | 
						|
      The earlier SAX 1 interface provided by the Python XML Special Interest Group
 | 
						|
      used a more Java-like interface for this method.  Since most parsers used from
 | 
						|
      Python did not take advantage of the older interface, the simpler signature was
 | 
						|
      chosen to replace it.  To convert old code to the new interface, use *content*
 | 
						|
      instead of slicing content with the old *offset* and *length* parameters.
 | 
						|
 | 
						|
 | 
						|
.. method:: ContentHandler.ignorableWhitespace(whitespace)
 | 
						|
 | 
						|
   Receive notification of ignorable whitespace in element content.
 | 
						|
 | 
						|
   Validating Parsers must use this method to report each chunk of ignorable
 | 
						|
   whitespace (see the W3C XML 1.0 recommendation, section 2.10): non-validating
 | 
						|
   parsers may also use this method if they are capable of parsing and using
 | 
						|
   content models.
 | 
						|
 | 
						|
   SAX parsers may return all contiguous whitespace in a single chunk, or they may
 | 
						|
   split it into several chunks; however, all of the characters in any single event
 | 
						|
   must come from the same external entity, so that the Locator provides useful
 | 
						|
   information.
 | 
						|
 | 
						|
 | 
						|
.. method:: ContentHandler.processingInstruction(target, data)
 | 
						|
 | 
						|
   Receive notification of a processing instruction.
 | 
						|
 | 
						|
   The Parser will invoke this method once for each processing instruction found:
 | 
						|
   note that processing instructions may occur before or after the main document
 | 
						|
   element.
 | 
						|
 | 
						|
   A SAX parser should never report an XML declaration (XML 1.0, section 2.8) or a
 | 
						|
   text declaration (XML 1.0, section 4.3.1) using this method.
 | 
						|
 | 
						|
 | 
						|
.. method:: ContentHandler.skippedEntity(name)
 | 
						|
 | 
						|
   Receive notification of a skipped entity.
 | 
						|
 | 
						|
   The Parser will invoke this method once for each entity skipped. Non-validating
 | 
						|
   processors may skip entities if they have not seen the declarations (because,
 | 
						|
   for example, the entity was declared in an external DTD subset). All processors
 | 
						|
   may skip external entities, depending on the values of the
 | 
						|
   ``feature_external_ges`` and the ``feature_external_pes`` properties.
 | 
						|
 | 
						|
 | 
						|
.. _dtd-handler-objects:
 | 
						|
 | 
						|
DTDHandler Objects
 | 
						|
------------------
 | 
						|
 | 
						|
:class:`DTDHandler` instances provide the following methods:
 | 
						|
 | 
						|
 | 
						|
.. method:: DTDHandler.notationDecl(name, publicId, systemId)
 | 
						|
 | 
						|
   Handle a notation declaration event.
 | 
						|
 | 
						|
 | 
						|
.. method:: DTDHandler.unparsedEntityDecl(name, publicId, systemId, ndata)
 | 
						|
 | 
						|
   Handle an unparsed entity declaration event.
 | 
						|
 | 
						|
 | 
						|
.. _entity-resolver-objects:
 | 
						|
 | 
						|
EntityResolver Objects
 | 
						|
----------------------
 | 
						|
 | 
						|
 | 
						|
.. method:: EntityResolver.resolveEntity(publicId, systemId)
 | 
						|
 | 
						|
   Resolve the system identifier of an entity and return either the system
 | 
						|
   identifier to read from as a string, or an InputSource to read from. The default
 | 
						|
   implementation returns *systemId*.
 | 
						|
 | 
						|
 | 
						|
.. _sax-error-handler:
 | 
						|
 | 
						|
ErrorHandler Objects
 | 
						|
--------------------
 | 
						|
 | 
						|
Objects with this interface are used to receive error and warning information
 | 
						|
from the :class:`~xml.sax.xmlreader.XMLReader`.  If you create an object that
 | 
						|
implements this interface, then register the object with your
 | 
						|
:class:`~xml.sax.xmlreader.XMLReader`, the parser
 | 
						|
will call the methods in your object to report all warnings and errors. There
 | 
						|
are three levels of errors available: warnings, (possibly) recoverable errors,
 | 
						|
and unrecoverable errors.  All methods take a :exc:`SAXParseException` as the
 | 
						|
only parameter.  Errors and warnings may be converted to an exception by raising
 | 
						|
the passed-in exception object.
 | 
						|
 | 
						|
 | 
						|
.. method:: ErrorHandler.error(exception)
 | 
						|
 | 
						|
   Called when the parser encounters a recoverable error.  If this method does not
 | 
						|
   raise an exception, parsing may continue, but further document information
 | 
						|
   should not be expected by the application.  Allowing the parser to continue may
 | 
						|
   allow additional errors to be discovered in the input document.
 | 
						|
 | 
						|
 | 
						|
.. method:: ErrorHandler.fatalError(exception)
 | 
						|
 | 
						|
   Called when the parser encounters an error it cannot recover from; parsing is
 | 
						|
   expected to terminate when this method returns.
 | 
						|
 | 
						|
 | 
						|
.. method:: ErrorHandler.warning(exception)
 | 
						|
 | 
						|
   Called when the parser presents minor warning information to the application.
 | 
						|
   Parsing is expected to continue when this method returns, and document
 | 
						|
   information will continue to be passed to the application. Raising an exception
 | 
						|
   in this method will cause parsing to end.
 | 
						|
 | 
						|
 | 
						|
.. _lexical-handler-objects:
 | 
						|
 | 
						|
LexicalHandler Objects
 | 
						|
----------------------
 | 
						|
Optional SAX2 handler for lexical events.
 | 
						|
 | 
						|
This handler is used to obtain lexical information about an XML
 | 
						|
document. Lexical information includes information describing the
 | 
						|
document encoding used and XML comments embedded in the document, as
 | 
						|
well as section boundaries for the DTD and for any CDATA sections.
 | 
						|
The lexical handlers are used in the same manner as content handlers.
 | 
						|
 | 
						|
Set the LexicalHandler of an XMLReader by using the setProperty method
 | 
						|
with the property identifier
 | 
						|
``'http://xml.org/sax/properties/lexical-handler'``.
 | 
						|
 | 
						|
 | 
						|
.. method:: LexicalHandler.comment(content)
 | 
						|
 | 
						|
   Reports a comment anywhere in the document (including the DTD and
 | 
						|
   outside the document element).
 | 
						|
 | 
						|
.. method:: LexicalHandler.startDTD(name, public_id, system_id)
 | 
						|
 | 
						|
   Reports the start of the DTD declarations if the document has an
 | 
						|
   associated DTD.
 | 
						|
 | 
						|
.. method:: LexicalHandler.endDTD()
 | 
						|
 | 
						|
   Reports the end of DTD declaration.
 | 
						|
 | 
						|
.. method:: LexicalHandler.startCDATA()
 | 
						|
 | 
						|
   Reports the start of a CDATA marked section.
 | 
						|
 | 
						|
   The contents of the CDATA marked section will be reported through
 | 
						|
   the characters handler.
 | 
						|
 | 
						|
.. method:: LexicalHandler.endCDATA()
 | 
						|
 | 
						|
   Reports the end of a CDATA marked section.
 |