mirror of
				https://github.com/python/cpython.git
				synced 2025-11-03 19:34:08 +00:00 
			
		
		
		
	svn+ssh://svn.python.org/python/branches/py3k ................ r74821 | georg.brandl | 2009-09-16 11:42:19 +0200 (Mi, 16 Sep 2009) | 1 line #6885: run python 3 as python3. ................ r74828 | georg.brandl | 2009-09-16 16:23:20 +0200 (Mi, 16 Sep 2009) | 1 line Use true booleans. ................ r74829 | georg.brandl | 2009-09-16 16:24:29 +0200 (Mi, 16 Sep 2009) | 1 line Small PEP8 correction. ................ r74830 | georg.brandl | 2009-09-16 16:36:22 +0200 (Mi, 16 Sep 2009) | 1 line Use true booleans. ................ r74831 | georg.brandl | 2009-09-16 17:54:04 +0200 (Mi, 16 Sep 2009) | 1 line Use true booleans and PEP8 for argdefaults. ................ r74833 | georg.brandl | 2009-09-16 17:58:14 +0200 (Mi, 16 Sep 2009) | 1 line Last round of adapting style of documenting argument default values. ................ r74835 | georg.brandl | 2009-09-16 18:00:31 +0200 (Mi, 16 Sep 2009) | 33 lines Merged revisions 74817-74820,74822-74824 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r74817 | georg.brandl | 2009-09-16 11:05:11 +0200 (Mi, 16 Sep 2009) | 1 line Make deprecation notices as visible as warnings are right now. ........ r74818 | georg.brandl | 2009-09-16 11:23:04 +0200 (Mi, 16 Sep 2009) | 1 line #6880: add reference to classes section in exceptions section, which comes earlier. ........ r74819 | georg.brandl | 2009-09-16 11:24:57 +0200 (Mi, 16 Sep 2009) | 1 line #6876: fix base class constructor invocation in example. ........ r74820 | georg.brandl | 2009-09-16 11:30:48 +0200 (Mi, 16 Sep 2009) | 1 line #6891: comment out dead link to Unicode article. ........ r74822 | georg.brandl | 2009-09-16 12:12:06 +0200 (Mi, 16 Sep 2009) | 1 line #5621: refactor description of how class/instance attributes interact on a.x=a.x+1 or augassign. ........ r74823 | georg.brandl | 2009-09-16 15:06:22 +0200 (Mi, 16 Sep 2009) | 1 line Remove strange trailing commas. ........ r74824 | georg.brandl | 2009-09-16 15:11:06 +0200 (Mi, 16 Sep 2009) | 1 line #6892: fix optparse example involving help option. ........ ................
		
			
				
	
	
		
			249 lines
		
	
	
	
		
			9.8 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			249 lines
		
	
	
	
		
			9.8 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
:mod:`xml.dom.minidom` --- Lightweight DOM implementation
 | 
						|
=========================================================
 | 
						|
 | 
						|
.. module:: xml.dom.minidom
 | 
						|
   :synopsis: Lightweight Document Object Model (DOM) implementation.
 | 
						|
.. moduleauthor:: Paul Prescod <paul@prescod.net>
 | 
						|
.. sectionauthor:: Paul Prescod <paul@prescod.net>
 | 
						|
.. sectionauthor:: Martin v. Löwis <martin@v.loewis.de>
 | 
						|
 | 
						|
 | 
						|
:mod:`xml.dom.minidom` is a light-weight implementation of the Document Object
 | 
						|
Model interface.  It is intended to be simpler than the full DOM and also
 | 
						|
significantly smaller.
 | 
						|
 | 
						|
DOM applications typically start by parsing some XML into a DOM.  With
 | 
						|
:mod:`xml.dom.minidom`, this is done through the parse functions::
 | 
						|
 | 
						|
   from xml.dom.minidom import parse, parseString
 | 
						|
 | 
						|
   dom1 = parse('c:\\temp\\mydata.xml') # parse an XML file by name
 | 
						|
 | 
						|
   datasource = open('c:\\temp\\mydata.xml')
 | 
						|
   dom2 = parse(datasource)   # parse an open file
 | 
						|
 | 
						|
   dom3 = parseString('<myxml>Some data<empty/> some more data</myxml>')
 | 
						|
 | 
						|
The :func:`parse` function can take either a filename or an open file object.
 | 
						|
 | 
						|
 | 
						|
.. function:: parse(filename_or_file, parser=None, bufsize=None)
 | 
						|
 | 
						|
   Return a :class:`Document` from the given input. *filename_or_file* may be
 | 
						|
   either a file name, or a file-like object. *parser*, if given, must be a SAX2
 | 
						|
   parser object. This function will change the document handler of the parser and
 | 
						|
   activate namespace support; other parser configuration (like setting an entity
 | 
						|
   resolver) must have been done in advance.
 | 
						|
 | 
						|
If you have XML in a string, you can use the :func:`parseString` function
 | 
						|
instead:
 | 
						|
 | 
						|
 | 
						|
.. function:: parseString(string, parser=None)
 | 
						|
 | 
						|
   Return a :class:`Document` that represents the *string*. This method creates a
 | 
						|
   :class:`StringIO` object for the string and passes that on to :func:`parse`.
 | 
						|
 | 
						|
Both functions return a :class:`Document` object representing the content of the
 | 
						|
document.
 | 
						|
 | 
						|
What the :func:`parse` and :func:`parseString` functions do is connect an XML
 | 
						|
parser with a "DOM builder" that can accept parse events from any SAX parser and
 | 
						|
convert them into a DOM tree.  The name of the functions are perhaps misleading,
 | 
						|
but are easy to grasp when learning the interfaces.  The parsing of the document
 | 
						|
will be completed before these functions return; it's simply that these
 | 
						|
functions do not provide a parser implementation themselves.
 | 
						|
 | 
						|
You can also create a :class:`Document` by calling a method on a "DOM
 | 
						|
Implementation" object.  You can get this object either by calling the
 | 
						|
:func:`getDOMImplementation` function in the :mod:`xml.dom` package or the
 | 
						|
:mod:`xml.dom.minidom` module. Using the implementation from the
 | 
						|
:mod:`xml.dom.minidom` module will always return a :class:`Document` instance
 | 
						|
from the minidom implementation, while the version from :mod:`xml.dom` may
 | 
						|
provide an alternate implementation (this is likely if you have the `PyXML
 | 
						|
package <http://pyxml.sourceforge.net/>`_ installed).  Once you have a
 | 
						|
:class:`Document`, you can add child nodes to it to populate the DOM::
 | 
						|
 | 
						|
   from xml.dom.minidom import getDOMImplementation
 | 
						|
 | 
						|
   impl = getDOMImplementation()
 | 
						|
 | 
						|
   newdoc = impl.createDocument(None, "some_tag", None)
 | 
						|
   top_element = newdoc.documentElement
 | 
						|
   text = newdoc.createTextNode('Some textual content.')
 | 
						|
   top_element.appendChild(text)
 | 
						|
 | 
						|
Once you have a DOM document object, you can access the parts of your XML
 | 
						|
document through its properties and methods.  These properties are defined in
 | 
						|
the DOM specification.  The main property of the document object is the
 | 
						|
:attr:`documentElement` property.  It gives you the main element in the XML
 | 
						|
document: the one that holds all others.  Here is an example program::
 | 
						|
 | 
						|
   dom3 = parseString("<myxml>Some data</myxml>")
 | 
						|
   assert dom3.documentElement.tagName == "myxml"
 | 
						|
 | 
						|
When you are finished with a DOM, you should clean it up.  This is necessary
 | 
						|
because some versions of Python do not support garbage collection of objects
 | 
						|
that refer to each other in a cycle.  Until this restriction is removed from all
 | 
						|
versions of Python, it is safest to write your code as if cycles would not be
 | 
						|
cleaned up.
 | 
						|
 | 
						|
The way to clean up a DOM is to call its :meth:`unlink` method::
 | 
						|
 | 
						|
   dom1.unlink()
 | 
						|
   dom2.unlink()
 | 
						|
   dom3.unlink()
 | 
						|
 | 
						|
:meth:`unlink` is a :mod:`xml.dom.minidom`\ -specific extension to the DOM API.
 | 
						|
After calling :meth:`unlink` on a node, the node and its descendants are
 | 
						|
essentially useless.
 | 
						|
 | 
						|
 | 
						|
.. seealso::
 | 
						|
 | 
						|
   `Document Object Model (DOM) Level 1 Specification <http://www.w3.org/TR/REC-DOM-Level-1/>`_
 | 
						|
      The W3C recommendation for the DOM supported by :mod:`xml.dom.minidom`.
 | 
						|
 | 
						|
 | 
						|
.. _minidom-objects:
 | 
						|
 | 
						|
DOM Objects
 | 
						|
-----------
 | 
						|
 | 
						|
The definition of the DOM API for Python is given as part of the :mod:`xml.dom`
 | 
						|
module documentation.  This section lists the differences between the API and
 | 
						|
:mod:`xml.dom.minidom`.
 | 
						|
 | 
						|
 | 
						|
.. method:: Node.unlink()
 | 
						|
 | 
						|
   Break internal references within the DOM so that it will be garbage collected on
 | 
						|
   versions of Python without cyclic GC.  Even when cyclic GC is available, using
 | 
						|
   this can make large amounts of memory available sooner, so calling this on DOM
 | 
						|
   objects as soon as they are no longer needed is good practice.  This only needs
 | 
						|
   to be called on the :class:`Document` object, but may be called on child nodes
 | 
						|
   to discard children of that node.
 | 
						|
 | 
						|
 | 
						|
.. method:: Node.writexml(writer, indent="", addindent="", newl="", encoding="")
 | 
						|
 | 
						|
   Write XML to the writer object.  The writer should have a :meth:`write` method
 | 
						|
   which matches that of the file object interface.  The *indent* parameter is the
 | 
						|
   indentation of the current node.  The *addindent* parameter is the incremental
 | 
						|
   indentation to use for subnodes of the current one.  The *newl* parameter
 | 
						|
   specifies the string to use to terminate newlines.
 | 
						|
 | 
						|
   For the :class:`Document` node, an additional keyword argument *encoding* can be
 | 
						|
   used to specify the encoding field of the XML header.
 | 
						|
 | 
						|
 | 
						|
.. method:: Node.toxml(encoding=None)
 | 
						|
 | 
						|
   Return the XML that the DOM represents as a string.
 | 
						|
 | 
						|
   With no argument, the XML header does not specify an encoding, and the result is
 | 
						|
   Unicode string if the default encoding cannot represent all characters in the
 | 
						|
   document. Encoding this string in an encoding other than UTF-8 is likely
 | 
						|
   incorrect, since UTF-8 is the default encoding of XML.
 | 
						|
 | 
						|
   With an explicit *encoding* [1]_ argument, the result is a byte string in the
 | 
						|
   specified encoding. It is recommended that this argument is always specified. To
 | 
						|
   avoid :exc:`UnicodeError` exceptions in case of unrepresentable text data, the
 | 
						|
   encoding argument should be specified as "utf-8".
 | 
						|
 | 
						|
 | 
						|
.. method:: Node.toprettyxml(indent="", newl="", encoding="")
 | 
						|
 | 
						|
   Return a pretty-printed version of the document. *indent* specifies the
 | 
						|
   indentation string and defaults to a tabulator; *newl* specifies the string
 | 
						|
   emitted at the end of each line and defaults to ``\n``.
 | 
						|
 | 
						|
   There's also an *encoding* argument; see :meth:`toxml`.
 | 
						|
 | 
						|
 | 
						|
.. _dom-example:
 | 
						|
 | 
						|
DOM Example
 | 
						|
-----------
 | 
						|
 | 
						|
This example program is a fairly realistic example of a simple program. In this
 | 
						|
particular case, we do not take much advantage of the flexibility of the DOM.
 | 
						|
 | 
						|
.. literalinclude:: ../includes/minidom-example.py
 | 
						|
 | 
						|
 | 
						|
.. _minidom-and-dom:
 | 
						|
 | 
						|
minidom and the DOM standard
 | 
						|
----------------------------
 | 
						|
 | 
						|
The :mod:`xml.dom.minidom` module is essentially a DOM 1.0-compatible DOM with
 | 
						|
some DOM 2 features (primarily namespace features).
 | 
						|
 | 
						|
Usage of the DOM interface in Python is straight-forward.  The following mapping
 | 
						|
rules apply:
 | 
						|
 | 
						|
* Interfaces are accessed through instance objects. Applications should not
 | 
						|
  instantiate the classes themselves; they should use the creator functions
 | 
						|
  available on the :class:`Document` object. Derived interfaces support all
 | 
						|
  operations (and attributes) from the base interfaces, plus any new operations.
 | 
						|
 | 
						|
* Operations are used as methods. Since the DOM uses only :keyword:`in`
 | 
						|
  parameters, the arguments are passed in normal order (from left to right).
 | 
						|
  There are no optional arguments. ``void`` operations return ``None``.
 | 
						|
 | 
						|
* IDL attributes map to instance attributes. For compatibility with the OMG IDL
 | 
						|
  language mapping for Python, an attribute ``foo`` can also be accessed through
 | 
						|
  accessor methods :meth:`_get_foo` and :meth:`_set_foo`.  ``readonly``
 | 
						|
  attributes must not be changed; this is not enforced at runtime.
 | 
						|
 | 
						|
* The types ``short int``, ``unsigned int``, ``unsigned long long``, and
 | 
						|
  ``boolean`` all map to Python integer objects.
 | 
						|
 | 
						|
* The type ``DOMString`` maps to Python strings. :mod:`xml.dom.minidom` supports
 | 
						|
  either bytes or strings, but will normally produce strings.
 | 
						|
  Values of type ``DOMString`` may also be ``None`` where allowed to have the IDL
 | 
						|
  ``null`` value by the DOM specification from the W3C.
 | 
						|
 | 
						|
* ``const`` declarations map to variables in their respective scope (e.g.
 | 
						|
  ``xml.dom.minidom.Node.PROCESSING_INSTRUCTION_NODE``); they must not be changed.
 | 
						|
 | 
						|
* ``DOMException`` is currently not supported in :mod:`xml.dom.minidom`.
 | 
						|
  Instead, :mod:`xml.dom.minidom` uses standard Python exceptions such as
 | 
						|
  :exc:`TypeError` and :exc:`AttributeError`.
 | 
						|
 | 
						|
* :class:`NodeList` objects are implemented using Python's built-in list type.
 | 
						|
  These objects provide the interface defined in the DOM specification, but with
 | 
						|
  earlier versions of Python they do not support the official API.  They are,
 | 
						|
  however, much more "Pythonic" than the interface defined in the W3C
 | 
						|
  recommendations.
 | 
						|
 | 
						|
The following interfaces have no implementation in :mod:`xml.dom.minidom`:
 | 
						|
 | 
						|
* :class:`DOMTimeStamp`
 | 
						|
 | 
						|
* :class:`DocumentType`
 | 
						|
 | 
						|
* :class:`DOMImplementation`
 | 
						|
 | 
						|
* :class:`CharacterData`
 | 
						|
 | 
						|
* :class:`CDATASection`
 | 
						|
 | 
						|
* :class:`Notation`
 | 
						|
 | 
						|
* :class:`Entity`
 | 
						|
 | 
						|
* :class:`EntityReference`
 | 
						|
 | 
						|
* :class:`DocumentFragment`
 | 
						|
 | 
						|
Most of these reflect information in the XML document that is not of general
 | 
						|
utility to most DOM users.
 | 
						|
 | 
						|
.. rubric:: Footnotes
 | 
						|
 | 
						|
.. [#] The encoding string included in XML output should conform to the
 | 
						|
   appropriate standards. For example, "UTF-8" is valid, but "UTF8" is
 | 
						|
   not. See http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
 | 
						|
   and http://www.iana.org/assignments/character-sets .
 |