Documentation updates for urllib package. Modified the documentation for the

urllib,urllib2 -> urllib.request,urllib.error urlparse -> urllib.parse RobotParser -> urllib.robotparser Updated tutorial references and other module references (http.client.rst, ftplib.rst,contextlib.rst) Updated the examples in the urllib2-howto Addresses Issue3142.
2025-08-04 17:08:35 +00:00 · 2008-06-23 04:41:59 +00:00 · 2008-06-23 04:41:59 +00:00 · aca8fd7a9d
commit aca8fd7a9d
parent d11a44312f
12 changed files with 565 additions and 593 deletions
--- a/Doc/library/urllib.parse.rst
+++ b/Doc/library/urllib.parse.rst
@ -0,0 +1,301 @@
+:mod:`urllib.parse` --- Parse URLs into components
+==================================================
+
+.. module:: urllib.parse
+   :synopsis: Parse URLs into or assemble them from components.
+
+
+.. index::
+   single: WWW
+   single: World Wide Web
+   single: URL
+   pair: URL; parsing
+   pair: relative; URL
+
+This module defines a standard interface to break Uniform Resource Locator (URL)
+strings up in components (addressing scheme, network location, path etc.), to
+combine the components back into a URL string, and to convert a "relative URL"
+to an absolute URL given a "base URL."
+
+The module has been designed to match the Internet RFC on Relative Uniform
+Resource Locators (and discovered a bug in an earlier draft!). It supports the
+following URL schemes: ``file``, ``ftp``, ``gopher``, ``hdl``, ``http``,
+``https``, ``imap``, ``mailto``, ``mms``, ``news``,  ``nntp``, ``prospero``,
+``rsync``, ``rtsp``, ``rtspu``,  ``sftp``, ``shttp``, ``sip``, ``sips``,
+``snews``, ``svn``,  ``svn+ssh``, ``telnet``, ``wais``.
+
+The :mod:`urllib.parse` module defines the following functions:
+
+
+.. function:: urlparse(urlstring[, default_scheme[, allow_fragments]])
+
+   Parse a URL into six components, returning a 6-tuple.  This corresponds to the
+   general structure of a URL: ``scheme://netloc/path;parameters?query#fragment``.
+   Each tuple item is a string, possibly empty. The components are not broken up in
+   smaller parts (for example, the network location is a single string), and %
+   escapes are not expanded. The delimiters as shown above are not part of the
+   result, except for a leading slash in the *path* component, which is retained if
+   present.  For example:
+
+      >>> from urllib.parse import urlparse
+      >>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
+      >>> o   # doctest: +NORMALIZE_WHITESPACE
+      ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
+                  params='', query='', fragment='')
+      >>> o.scheme
+      'http'
+      >>> o.port
+      80
+      >>> o.geturl()
+      'http://www.cwi.nl:80/%7Eguido/Python.html'
+
+   If the *default_scheme* argument is specified, it gives the default addressing
+   scheme, to be used only if the URL does not specify one.  The default value for
+   this argument is the empty string.
+
+   If the *allow_fragments* argument is false, fragment identifiers are not
+   allowed, even if the URL's addressing scheme normally does support them.  The
+   default value for this argument is :const:`True`.
+
+   The return value is actually an instance of a subclass of :class:`tuple`.  This
+   class has the following additional read-only convenience attributes:
+
+   +------------------+-------+--------------------------+----------------------+
+   | Attribute        | Index | Value                    | Value if not present |
+   +==================+=======+==========================+======================+
+   | :attr:`scheme`   | 0     | URL scheme specifier     | empty string         |
+   +------------------+-------+--------------------------+----------------------+
+   | :attr:`netloc`   | 1     | Network location part    | empty string         |
+   +------------------+-------+--------------------------+----------------------+
+   | :attr:`path`     | 2     | Hierarchical path        | empty string         |
+   +------------------+-------+--------------------------+----------------------+
+   | :attr:`params`   | 3     | Parameters for last path | empty string         |
+   |                  |       | element                  |                      |
+   +------------------+-------+--------------------------+----------------------+
+   | :attr:`query`    | 4     | Query component          | empty string         |
+   +------------------+-------+--------------------------+----------------------+
+   | :attr:`fragment` | 5     | Fragment identifier      | empty string         |
+   +------------------+-------+--------------------------+----------------------+
+   | :attr:`username` |       | User name                | :const:`None`        |
+   +------------------+-------+--------------------------+----------------------+
+   | :attr:`password` |       | Password                 | :const:`None`        |
+   +------------------+-------+--------------------------+----------------------+
+   | :attr:`hostname` |       | Host name (lower case)   | :const:`None`        |
+   +------------------+-------+--------------------------+----------------------+
+   | :attr:`port`     |       | Port number as integer,  | :const:`None`        |
+   |                  |       | if present               |                      |
+   +------------------+-------+--------------------------+----------------------+
+
+   See section :ref:`urlparse-result-object` for more information on the result
+   object.
+
+
+.. function:: urlunparse(parts)
+
+   Construct a URL from a tuple as returned by ``urlparse()``. The *parts* argument
+   can be any six-item iterable. This may result in a slightly different, but
+   equivalent URL, if the URL that was parsed originally had unnecessary delimiters
+   (for example, a ? with an empty query; the RFC states that these are
+   equivalent).
+
+
+.. function:: urlsplit(urlstring[, default_scheme[, allow_fragments]])
+
+   This is similar to :func:`urlparse`, but does not split the params from the URL.
+   This should generally be used instead of :func:`urlparse` if the more recent URL
+   syntax allowing parameters to be applied to each segment of the *path* portion
+   of the URL (see :rfc:`2396`) is wanted.  A separate function is needed to
+   separate the path segments and parameters.  This function returns a 5-tuple:
+   (addressing scheme, network location, path, query, fragment identifier).
+
+   The return value is actually an instance of a subclass of :class:`tuple`.  This
+   class has the following additional read-only convenience attributes:
+
+   +------------------+-------+-------------------------+----------------------+
+   | Attribute        | Index | Value                   | Value if not present |
+   +==================+=======+=========================+======================+
+   | :attr:`scheme`   | 0     | URL scheme specifier    | empty string         |
+   +------------------+-------+-------------------------+----------------------+
+   | :attr:`netloc`   | 1     | Network location part   | empty string         |
+   +------------------+-------+-------------------------+----------------------+
+   | :attr:`path`     | 2     | Hierarchical path       | empty string         |
+   +------------------+-------+-------------------------+----------------------+
+   | :attr:`query`    | 3     | Query component         | empty string         |
+   +------------------+-------+-------------------------+----------------------+
+   | :attr:`fragment` | 4     | Fragment identifier     | empty string         |
+   +------------------+-------+-------------------------+----------------------+
+   | :attr:`username` |       | User name               | :const:`None`        |
+   +------------------+-------+-------------------------+----------------------+
+   | :attr:`password` |       | Password                | :const:`None`        |
+   +------------------+-------+-------------------------+----------------------+
+   | :attr:`hostname` |       | Host name (lower case)  | :const:`None`        |
+   +------------------+-------+-------------------------+----------------------+
+   | :attr:`port`     |       | Port number as integer, | :const:`None`        |
+   |                  |       | if present              |                      |
+   +------------------+-------+-------------------------+----------------------+
+
+   See section :ref:`urlparse-result-object` for more information on the result
+   object.
+
+
+.. function:: urlunsplit(parts)
+
+   Combine the elements of a tuple as returned by :func:`urlsplit` into a complete
+   URL as a string. The *parts* argument can be any five-item iterable. This may
+   result in a slightly different, but equivalent URL, if the URL that was parsed
+   originally had unnecessary delimiters (for example, a ? with an empty query; the
+   RFC states that these are equivalent).
+
+
+.. function:: urljoin(base, url[, allow_fragments])
+
+   Construct a full ("absolute") URL by combining a "base URL" (*base*) with
+   another URL (*url*).  Informally, this uses components of the base URL, in
+   particular the addressing scheme, the network location and (part of) the path,
+   to provide missing components in the relative URL.  For example:
+
+      >>> from urllib.parse import urljoin
+      >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
+      'http://www.cwi.nl/%7Eguido/FAQ.html'
+
+   The *allow_fragments* argument has the same meaning and default as for
+   :func:`urlparse`.
+
+   .. note::
+
+      If *url* is an absolute URL (that is, starting with ``//`` or ``scheme://``),
+      the *url*'s host name and/or scheme will be present in the result.  For example:
+
+   .. doctest::
+
+      >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html',
+      ...         '//www.python.org/%7Eguido')
+      'http://www.python.org/%7Eguido'
+
+   If you do not want that behavior, preprocess the *url* with :func:`urlsplit` and
+   :func:`urlunsplit`, removing possible *scheme* and *netloc* parts.
+
+
+.. function:: urldefrag(url)
+
+   If *url* contains a fragment identifier, returns a modified version of *url*
+   with no fragment identifier, and the fragment identifier as a separate string.
+   If there is no fragment identifier in *url*, returns *url* unmodified and an
+   empty string.
+
+.. function:: quote(string[, safe])
+
+   Replace special characters in *string* using the ``%xx`` escape. Letters,
+   digits, and the characters ``'_.-'`` are never quoted. The optional *safe*
+   parameter specifies additional characters that should not be quoted --- its
+   default value is ``'/'``.
+
+   Example: ``quote('/~connolly/')`` yields ``'/%7econnolly/'``.
+
+
+.. function:: quote_plus(string[, safe])
+
+   Like :func:`quote`, but also replaces spaces by plus signs, as required for
+   quoting HTML form values.  Plus signs in the original string are escaped unless
+   they are included in *safe*.  It also does not have *safe* default to ``'/'``.
+
+
+.. function:: unquote(string)
+
+   Replace ``%xx`` escapes by their single-character equivalent.
+
+   Example: ``unquote('/%7Econnolly/')`` yields ``'/~connolly/'``.
+
+
+.. function:: unquote_plus(string)
+
+   Like :func:`unquote`, but also replaces plus signs by spaces, as required for
+   unquoting HTML form values.
+
+
+.. function:: urlencode(query[, doseq])
+
+   Convert a mapping object or a sequence of two-element tuples  to a "url-encoded"
+   string, suitable to pass to :func:`urlopen` above as the optional *data*
+   argument.  This is useful to pass a dictionary of form fields to a ``POST``
+   request.  The resulting string is a series of ``key=value`` pairs separated by
+   ``'&'`` characters, where both *key* and *value* are quoted using
+   :func:`quote_plus` above.  If the optional parameter *doseq* is present and
+   evaluates to true, individual ``key=value`` pairs are generated for each element
+   of the sequence. When a sequence of two-element tuples is used as the *query*
+   argument, the first element of each tuple is a key and the second is a value.
+   The order of parameters in the encoded string will match the order of parameter
+   tuples in the sequence. The :mod:`cgi` module provides the functions
+   :func:`parse_qs` and :func:`parse_qsl` which are used to parse query strings
+   into Python data structures.
+
+
+.. seealso::
+
+   :rfc:`1738` - Uniform Resource Locators (URL)
+      This specifies the formal syntax and semantics of absolute URLs.
+
+   :rfc:`1808` - Relative Uniform Resource Locators
+      This Request For Comments includes the rules for joining an absolute and a
+      relative URL, including a fair number of "Abnormal Examples" which govern the
+      treatment of border cases.
+
+   :rfc:`2396` - Uniform Resource Identifiers (URI): Generic Syntax
+      Document describing the generic syntactic requirements for both Uniform Resource
+      Names (URNs) and Uniform Resource Locators (URLs).
+
+
+.. _urlparse-result-object:
+
+Results of :func:`urlparse` and :func:`urlsplit`
+------------------------------------------------
+
+The result objects from the :func:`urlparse` and :func:`urlsplit` functions are
+subclasses of the :class:`tuple` type.  These subclasses add the attributes
+described in those functions, as well as provide an additional method:
+
+
+.. method:: ParseResult.geturl()
+
+   Return the re-combined version of the original URL as a string. This may differ
+   from the original URL in that the scheme will always be normalized to lower case
+   and empty components may be dropped. Specifically, empty parameters, queries,
+   and fragment identifiers will be removed.
+
+   The result of this method is a fixpoint if passed back through the original
+   parsing function:
+
+      >>> import urllib.parse
+      >>> url = 'HTTP://www.Python.org/doc/#'
+
+      >>> r1 = urllib.parse.urlsplit(url)
+      >>> r1.geturl()
+      'http://www.Python.org/doc/'
+
+      >>> r2 = urllib.parse.urlsplit(r1.geturl())
+      >>> r2.geturl()
+      'http://www.Python.org/doc/'
+
+
+The following classes provide the implementations of the parse results::
+
+
+.. class:: BaseResult
+
+   Base class for the concrete result classes.  This provides most of the attribute
+   definitions.  It does not provide a :meth:`geturl` method.  It is derived from
+   :class:`tuple`, but does not override the :meth:`__init__` or :meth:`__new__`
+   methods.
+
+
+.. class:: ParseResult(scheme, netloc, path, params, query, fragment)
+
+   Concrete class for :func:`urlparse` results.  The :meth:`__new__` method is
+   overridden to support checking that the right number of arguments are passed.
+
+
+.. class:: SplitResult(scheme, netloc, path, query, fragment)
+
+   Concrete class for :func:`urlsplit` results.  The :meth:`__new__` method is
+   overridden to support checking that the right number of arguments are passed.
+