mirror of
https://github.com/python/cpython.git
synced 2025-09-27 10:50:04 +00:00
Added more information on the differences between the htmllib and HTMLParser
modules.
This commit is contained in:
parent
5fe2c139d5
commit
25211f5724
3 changed files with 16 additions and 3 deletions
|
@ -70,6 +70,12 @@ handlers for all HTML 2.0 and many HTML 3.0 and 3.2 elements.
|
||||||
|
|
||||||
|
|
||||||
\begin{seealso}
|
\begin{seealso}
|
||||||
|
\seemodule{HTMLParser}{Alternate HTML parser that offers a slightly
|
||||||
|
lower-level view of the input, but is
|
||||||
|
designed to work with XHTML, and does not
|
||||||
|
implement some of the SGML syntax not used in
|
||||||
|
``HTML as deployed'' and which isn't legal
|
||||||
|
for XHTML.}
|
||||||
\seemodule{htmlentitydefs}{Definition of replacement text for HTML
|
\seemodule{htmlentitydefs}{Definition of replacement text for HTML
|
||||||
2.0 entities.}
|
2.0 entities.}
|
||||||
\seemodule{sgmllib}{Base class for \class{HTMLParser}.}
|
\seemodule{sgmllib}{Base class for \class{HTMLParser}.}
|
||||||
|
|
|
@ -6,7 +6,9 @@
|
||||||
|
|
||||||
This module defines a class \class{HTMLParser} which serves as the
|
This module defines a class \class{HTMLParser} which serves as the
|
||||||
basis for parsing text files formatted in HTML\index{HTML} (HyperText
|
basis for parsing text files formatted in HTML\index{HTML} (HyperText
|
||||||
Mark-up Language) and XHTML.\index{XHTML}
|
Mark-up Language) and XHTML.\index{XHTML} Unlike the parser in
|
||||||
|
\refmodule{htmllib}, this parser is not based on the SGML parser in
|
||||||
|
\refmodule{sgmllib}.
|
||||||
|
|
||||||
|
|
||||||
\begin{classdesc}{HTMLParser}{}
|
\begin{classdesc}{HTMLParser}{}
|
||||||
|
@ -15,6 +17,10 @@ The \class{HTMLParser} class is instantiated without arguments.
|
||||||
An HTMLParser instance is fed HTML data and calls handler functions
|
An HTMLParser instance is fed HTML data and calls handler functions
|
||||||
when tags begin and end. The \class{HTMLParser} class is meant to be
|
when tags begin and end. The \class{HTMLParser} class is meant to be
|
||||||
overridden by the user to provide a desired behavior.
|
overridden by the user to provide a desired behavior.
|
||||||
|
|
||||||
|
Unlike the parser in \refmodule{htmllib}, this parser does not check
|
||||||
|
that end tags match start tags or call the end-tag handler for
|
||||||
|
elements which are closed implicitly by closing an outer element.
|
||||||
\end{classdesc}
|
\end{classdesc}
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -10,8 +10,9 @@ This module defines a class \class{SGMLParser} which serves as the
|
||||||
basis for parsing text files formatted in SGML (Standard Generalized
|
basis for parsing text files formatted in SGML (Standard Generalized
|
||||||
Mark-up Language). In fact, it does not provide a full SGML parser
|
Mark-up Language). In fact, it does not provide a full SGML parser
|
||||||
--- it only parses SGML insofar as it is used by HTML, and the module
|
--- it only parses SGML insofar as it is used by HTML, and the module
|
||||||
only exists as a base for the \refmodule{htmllib}\refstmodindex{htmllib}
|
only exists as a base for the \refmodule{htmllib} module. Another
|
||||||
module.
|
HTML parser which supports XHTML and offers a somewhat different
|
||||||
|
interface is available in the \refmodule{HTMLParser} module.
|
||||||
|
|
||||||
|
|
||||||
\begin{classdesc}{SGMLParser}{}
|
\begin{classdesc}{SGMLParser}{}
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue