mirror of
https://github.com/python/cpython.git
synced 2025-08-31 05:58:33 +00:00

svn+ssh://svn.python.org/python/branches/py3k ................ r74609 | senthil.kumaran | 2009-08-31 18:43:45 +0200 (Mo, 31 Aug 2009) | 3 lines Doc fix for issue2637. ................ r74627 | georg.brandl | 2009-09-02 22:31:26 +0200 (Mi, 02 Sep 2009) | 1 line #6819: fix typo. ................ r74634 | georg.brandl | 2009-09-03 14:34:10 +0200 (Do, 03 Sep 2009) | 9 lines Merged revisions 74633 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r74633 | georg.brandl | 2009-09-03 14:31:39 +0200 (Do, 03 Sep 2009) | 1 line #6757: complete the list of types that marshal can serialize. ........ ................ r74645 | georg.brandl | 2009-09-04 10:07:32 +0200 (Fr, 04 Sep 2009) | 1 line #5221: fix related topics: SEQUENCEMETHODS[12] doesnt exist any more. ................ r74651 | georg.brandl | 2009-09-04 13:20:54 +0200 (Fr, 04 Sep 2009) | 9 lines Recorded merge of revisions 74650 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r74650 | georg.brandl | 2009-09-04 13:19:34 +0200 (Fr, 04 Sep 2009) | 1 line #5101: add back tests to test_funcattrs that were lost during unittest conversion, and make some PEP8 cleanups. ........ ................ r74738 | georg.brandl | 2009-09-09 18:51:05 +0200 (Mi, 09 Sep 2009) | 9 lines Merged revisions 74737 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r74737 | georg.brandl | 2009-09-09 18:49:13 +0200 (Mi, 09 Sep 2009) | 1 line Properly document copy and deepcopy as functions. ........ ................ r74840 | georg.brandl | 2009-09-16 18:40:45 +0200 (Mi, 16 Sep 2009) | 13 lines Merged revisions 74838-74839 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r74838 | georg.brandl | 2009-09-16 18:22:12 +0200 (Mi, 16 Sep 2009) | 1 line Remove some more boilerplate from the actual tests in test_pdb. ........ r74839 | georg.brandl | 2009-09-16 18:36:39 +0200 (Mi, 16 Sep 2009) | 1 line Make the pdb displayhook compatible with the standard displayhook: do not print Nones. Add a test for that. ........ ................ r75016 | georg.brandl | 2009-09-22 15:53:14 +0200 (Di, 22 Sep 2009) | 1 line #6969: make it explicit that configparser writes/reads text files, and fix the example. ................ r75316 | georg.brandl | 2009-10-10 23:12:35 +0200 (Sa, 10 Okt 2009) | 9 lines Merged revisions 75313 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r75313 | georg.brandl | 2009-10-10 23:07:35 +0200 (Sa, 10 Okt 2009) | 1 line Bring old demo up-to-date. ........ ................ r75317 | georg.brandl | 2009-10-10 23:13:21 +0200 (Sa, 10 Okt 2009) | 9 lines Merged revisions 75315 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r75315 | georg.brandl | 2009-10-10 23:10:05 +0200 (Sa, 10 Okt 2009) | 1 line Remove unneeded "L" suffixes. ........ ................ r75323 | georg.brandl | 2009-10-10 23:48:05 +0200 (Sa, 10 Okt 2009) | 9 lines Recorded merge of revisions 75321 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r75321 | georg.brandl | 2009-10-10 23:43:21 +0200 (Sa, 10 Okt 2009) | 1 line Remove outdated comment and fix a few style issues. ........ ................ r75324 | georg.brandl | 2009-10-10 23:49:24 +0200 (Sa, 10 Okt 2009) | 9 lines Merged revisions 75322 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r75322 | georg.brandl | 2009-10-10 23:47:31 +0200 (Sa, 10 Okt 2009) | 1 line Show use of range() step argument nicely. ........ ................ r75326 | georg.brandl | 2009-10-10 23:57:03 +0200 (Sa, 10 Okt 2009) | 9 lines Merged revisions 75325 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r75325 | georg.brandl | 2009-10-10 23:55:11 +0200 (Sa, 10 Okt 2009) | 1 line Modernize factorisation demo (mostly augassign.) ........ ................ r75328 | georg.brandl | 2009-10-11 00:05:26 +0200 (So, 11 Okt 2009) | 9 lines Merged revisions 75327 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r75327 | georg.brandl | 2009-10-11 00:03:43 +0200 (So, 11 Okt 2009) | 1 line Style fixes. ........ ................ r75330 | georg.brandl | 2009-10-11 00:32:28 +0200 (So, 11 Okt 2009) | 9 lines Merged revisions 75329 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r75329 | georg.brandl | 2009-10-11 00:26:45 +0200 (So, 11 Okt 2009) | 1 line Modernize all around (dont ask me how useful that script is nowadays...) ........ ................ r75338 | georg.brandl | 2009-10-11 10:31:41 +0200 (So, 11 Okt 2009) | 9 lines Merged revisions 75337 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r75337 | georg.brandl | 2009-10-11 10:18:44 +0200 (So, 11 Okt 2009) | 1 line Update morse script, avoid globals, use iterators. ........ ................ r75340 | georg.brandl | 2009-10-11 10:42:09 +0200 (So, 11 Okt 2009) | 9 lines Merged revisions 75339 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r75339 | georg.brandl | 2009-10-11 10:39:16 +0200 (So, 11 Okt 2009) | 1 line Update markov demo. ........ ................ r75341 | georg.brandl | 2009-10-11 10:43:08 +0200 (So, 11 Okt 2009) | 1 line Fix README description. ................ r75343 | georg.brandl | 2009-10-11 10:46:56 +0200 (So, 11 Okt 2009) | 9 lines Merged revisions 75342 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r75342 | georg.brandl | 2009-10-11 10:45:03 +0200 (So, 11 Okt 2009) | 1 line Remove useless script "mkrcs" and update README. ........ ................ r75352 | georg.brandl | 2009-10-11 14:04:10 +0200 (So, 11 Okt 2009) | 9 lines Merged revisions 75350 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r75350 | georg.brandl | 2009-10-11 14:00:18 +0200 (So, 11 Okt 2009) | 1 line Use getopt in script.py demo. ........ ................ r75353 | georg.brandl | 2009-10-11 14:04:40 +0200 (So, 11 Okt 2009) | 9 lines Merged revisions 75351 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r75351 | georg.brandl | 2009-10-11 14:03:01 +0200 (So, 11 Okt 2009) | 1 line Fix variable. ........ ................ r75355 | georg.brandl | 2009-10-11 16:27:51 +0200 (So, 11 Okt 2009) | 9 lines Merged revisions 75354 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r75354 | georg.brandl | 2009-10-11 16:23:49 +0200 (So, 11 Okt 2009) | 1 line Update lpwatch script. ........ ................ r75357 | georg.brandl | 2009-10-11 16:50:57 +0200 (So, 11 Okt 2009) | 9 lines Merged revisions 75356 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r75356 | georg.brandl | 2009-10-11 16:49:37 +0200 (So, 11 Okt 2009) | 1 line Remove ftpstats script, the daemon whose log files it reads is long gone. ........ ................ r75359 | georg.brandl | 2009-10-11 17:56:06 +0200 (So, 11 Okt 2009) | 9 lines Merged revisions 75358 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r75358 | georg.brandl | 2009-10-11 17:06:44 +0200 (So, 11 Okt 2009) | 1 line Overhaul of Demo/xml. ........ ................
393 lines
18 KiB
ReStructuredText
393 lines
18 KiB
ReStructuredText
:mod:`urllib.parse` --- Parse URLs into components
|
|
==================================================
|
|
|
|
.. module:: urllib.parse
|
|
:synopsis: Parse URLs into or assemble them from components.
|
|
|
|
|
|
.. index::
|
|
single: WWW
|
|
single: World Wide Web
|
|
single: URL
|
|
pair: URL; parsing
|
|
pair: relative; URL
|
|
|
|
This module defines a standard interface to break Uniform Resource Locator (URL)
|
|
strings up in components (addressing scheme, network location, path etc.), to
|
|
combine the components back into a URL string, and to convert a "relative URL"
|
|
to an absolute URL given a "base URL."
|
|
|
|
The module has been designed to match the Internet RFC on Relative Uniform
|
|
Resource Locators (and discovered a bug in an earlier draft!). It supports the
|
|
following URL schemes: ``file``, ``ftp``, ``gopher``, ``hdl``, ``http``,
|
|
``https``, ``imap``, ``mailto``, ``mms``, ``news``, ``nntp``, ``prospero``,
|
|
``rsync``, ``rtsp``, ``rtspu``, ``sftp``, ``shttp``, ``sip``, ``sips``,
|
|
``snews``, ``svn``, ``svn+ssh``, ``telnet``, ``wais``.
|
|
|
|
The :mod:`urllib.parse` module defines the following functions:
|
|
|
|
.. function:: urlparse(urlstring, default_scheme='', allow_fragments=True)
|
|
|
|
Parse a URL into six components, returning a 6-tuple. This corresponds to the
|
|
general structure of a URL: ``scheme://netloc/path;parameters?query#fragment``.
|
|
Each tuple item is a string, possibly empty. The components are not broken up in
|
|
smaller parts (for example, the network location is a single string), and %
|
|
escapes are not expanded. The delimiters as shown above are not part of the
|
|
result, except for a leading slash in the *path* component, which is retained if
|
|
present. For example:
|
|
|
|
>>> from urllib.parse import urlparse
|
|
>>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
|
|
>>> o # doctest: +NORMALIZE_WHITESPACE
|
|
ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
|
|
params='', query='', fragment='')
|
|
>>> o.scheme
|
|
'http'
|
|
>>> o.port
|
|
80
|
|
>>> o.geturl()
|
|
'http://www.cwi.nl:80/%7Eguido/Python.html'
|
|
|
|
If the *default_scheme* argument is specified, it gives the default addressing
|
|
scheme, to be used only if the URL does not specify one. The default value for
|
|
this argument is the empty string.
|
|
|
|
If the *allow_fragments* argument is false, fragment identifiers are not
|
|
allowed, even if the URL's addressing scheme normally does support them. The
|
|
default value for this argument is :const:`True`.
|
|
|
|
The return value is actually an instance of a subclass of :class:`tuple`. This
|
|
class has the following additional read-only convenience attributes:
|
|
|
|
+------------------+-------+--------------------------+----------------------+
|
|
| Attribute | Index | Value | Value if not present |
|
|
+==================+=======+==========================+======================+
|
|
| :attr:`scheme` | 0 | URL scheme specifier | empty string |
|
|
+------------------+-------+--------------------------+----------------------+
|
|
| :attr:`netloc` | 1 | Network location part | empty string |
|
|
+------------------+-------+--------------------------+----------------------+
|
|
| :attr:`path` | 2 | Hierarchical path | empty string |
|
|
+------------------+-------+--------------------------+----------------------+
|
|
| :attr:`params` | 3 | Parameters for last path | empty string |
|
|
| | | element | |
|
|
+------------------+-------+--------------------------+----------------------+
|
|
| :attr:`query` | 4 | Query component | empty string |
|
|
+------------------+-------+--------------------------+----------------------+
|
|
| :attr:`fragment` | 5 | Fragment identifier | empty string |
|
|
+------------------+-------+--------------------------+----------------------+
|
|
| :attr:`username` | | User name | :const:`None` |
|
|
+------------------+-------+--------------------------+----------------------+
|
|
| :attr:`password` | | Password | :const:`None` |
|
|
+------------------+-------+--------------------------+----------------------+
|
|
| :attr:`hostname` | | Host name (lower case) | :const:`None` |
|
|
+------------------+-------+--------------------------+----------------------+
|
|
| :attr:`port` | | Port number as integer, | :const:`None` |
|
|
| | | if present | |
|
|
+------------------+-------+--------------------------+----------------------+
|
|
|
|
See section :ref:`urlparse-result-object` for more information on the result
|
|
object.
|
|
|
|
|
|
.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False)
|
|
|
|
Parse a query string given as a string argument (data of type
|
|
:mimetype:`application/x-www-form-urlencoded`). Data are returned as a
|
|
dictionary. The dictionary keys are the unique query variable names and the
|
|
values are lists of values for each name.
|
|
|
|
The optional argument *keep_blank_values* is a flag indicating whether blank
|
|
values in URL encoded queries should be treated as blank strings. A true value
|
|
indicates that blanks should be retained as blank strings. The default false
|
|
value indicates that blank values are to be ignored and treated as if they were
|
|
not included.
|
|
|
|
The optional argument *strict_parsing* is a flag indicating what to do with
|
|
parsing errors. If false (the default), errors are silently ignored. If true,
|
|
errors raise a :exc:`ValueError` exception.
|
|
|
|
Use the :func:`urllib.parse.urlencode` function to convert such
|
|
dictionaries into query strings.
|
|
|
|
|
|
.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False)
|
|
|
|
Parse a query string given as a string argument (data of type
|
|
:mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of
|
|
name, value pairs.
|
|
|
|
The optional argument *keep_blank_values* is a flag indicating whether blank
|
|
values in URL encoded queries should be treated as blank strings. A true value
|
|
indicates that blanks should be retained as blank strings. The default false
|
|
value indicates that blank values are to be ignored and treated as if they were
|
|
not included.
|
|
|
|
The optional argument *strict_parsing* is a flag indicating what to do with
|
|
parsing errors. If false (the default), errors are silently ignored. If true,
|
|
errors raise a :exc:`ValueError` exception.
|
|
|
|
Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into
|
|
query strings.
|
|
|
|
|
|
.. function:: urlunparse(parts)
|
|
|
|
Construct a URL from a tuple as returned by ``urlparse()``. The *parts*
|
|
argument can be any six-item iterable. This may result in a slightly
|
|
different, but equivalent URL, if the URL that was parsed originally had
|
|
unnecessary delimiters (for example, a ``?`` with an empty query; the RFC
|
|
states that these are equivalent).
|
|
|
|
|
|
.. function:: urlsplit(urlstring, default_scheme='', allow_fragments=True)
|
|
|
|
This is similar to :func:`urlparse`, but does not split the params from the URL.
|
|
This should generally be used instead of :func:`urlparse` if the more recent URL
|
|
syntax allowing parameters to be applied to each segment of the *path* portion
|
|
of the URL (see :rfc:`2396`) is wanted. A separate function is needed to
|
|
separate the path segments and parameters. This function returns a 5-tuple:
|
|
(addressing scheme, network location, path, query, fragment identifier).
|
|
|
|
The return value is actually an instance of a subclass of :class:`tuple`. This
|
|
class has the following additional read-only convenience attributes:
|
|
|
|
+------------------+-------+-------------------------+----------------------+
|
|
| Attribute | Index | Value | Value if not present |
|
|
+==================+=======+=========================+======================+
|
|
| :attr:`scheme` | 0 | URL scheme specifier | empty string |
|
|
+------------------+-------+-------------------------+----------------------+
|
|
| :attr:`netloc` | 1 | Network location part | empty string |
|
|
+------------------+-------+-------------------------+----------------------+
|
|
| :attr:`path` | 2 | Hierarchical path | empty string |
|
|
+------------------+-------+-------------------------+----------------------+
|
|
| :attr:`query` | 3 | Query component | empty string |
|
|
+------------------+-------+-------------------------+----------------------+
|
|
| :attr:`fragment` | 4 | Fragment identifier | empty string |
|
|
+------------------+-------+-------------------------+----------------------+
|
|
| :attr:`username` | | User name | :const:`None` |
|
|
+------------------+-------+-------------------------+----------------------+
|
|
| :attr:`password` | | Password | :const:`None` |
|
|
+------------------+-------+-------------------------+----------------------+
|
|
| :attr:`hostname` | | Host name (lower case) | :const:`None` |
|
|
+------------------+-------+-------------------------+----------------------+
|
|
| :attr:`port` | | Port number as integer, | :const:`None` |
|
|
| | | if present | |
|
|
+------------------+-------+-------------------------+----------------------+
|
|
|
|
See section :ref:`urlparse-result-object` for more information on the result
|
|
object.
|
|
|
|
|
|
.. function:: urlunsplit(parts)
|
|
|
|
Combine the elements of a tuple as returned by :func:`urlsplit` into a
|
|
complete URL as a string. The *parts* argument can be any five-item
|
|
iterable. This may result in a slightly different, but equivalent URL, if the
|
|
URL that was parsed originally had unnecessary delimiters (for example, a ?
|
|
with an empty query; the RFC states that these are equivalent).
|
|
|
|
|
|
.. function:: urljoin(base, url, allow_fragments=True)
|
|
|
|
Construct a full ("absolute") URL by combining a "base URL" (*base*) with
|
|
another URL (*url*). Informally, this uses components of the base URL, in
|
|
particular the addressing scheme, the network location and (part of) the
|
|
path, to provide missing components in the relative URL. For example:
|
|
|
|
>>> from urllib.parse import urljoin
|
|
>>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
|
|
'http://www.cwi.nl/%7Eguido/FAQ.html'
|
|
|
|
The *allow_fragments* argument has the same meaning and default as for
|
|
:func:`urlparse`.
|
|
|
|
.. note::
|
|
|
|
If *url* is an absolute URL (that is, starting with ``//`` or ``scheme://``),
|
|
the *url*'s host name and/or scheme will be present in the result. For example:
|
|
|
|
.. doctest::
|
|
|
|
>>> urljoin('http://www.cwi.nl/%7Eguido/Python.html',
|
|
... '//www.python.org/%7Eguido')
|
|
'http://www.python.org/%7Eguido'
|
|
|
|
If you do not want that behavior, preprocess the *url* with :func:`urlsplit` and
|
|
:func:`urlunsplit`, removing possible *scheme* and *netloc* parts.
|
|
|
|
|
|
.. function:: urldefrag(url)
|
|
|
|
If *url* contains a fragment identifier, return a modified version of *url*
|
|
with no fragment identifier, and the fragment identifier as a separate
|
|
string. If there is no fragment identifier in *url*, return *url* unmodified
|
|
and an empty string.
|
|
|
|
|
|
.. function:: quote(string, safe='/', encoding=None, errors=None)
|
|
|
|
Replace special characters in *string* using the ``%xx`` escape. Letters,
|
|
digits, and the characters ``'_.-'`` are never quoted. By default, this
|
|
function is intended for quoting the path section of URL. The optional *safe*
|
|
parameter specifies additional ASCII characters that should not be quoted
|
|
--- its default value is ``'/'``.
|
|
|
|
*string* may be either a :class:`str` or a :class:`bytes`.
|
|
|
|
The optional *encoding* and *errors* parameters specify how to deal with
|
|
non-ASCII characters, as accepted by the :meth:`str.encode` method.
|
|
*encoding* defaults to ``'utf-8'``.
|
|
*errors* defaults to ``'strict'``, meaning unsupported characters raise a
|
|
:class:`UnicodeEncodeError`.
|
|
*encoding* and *errors* must not be supplied if *string* is a
|
|
:class:`bytes`, or a :class:`TypeError` is raised.
|
|
|
|
Note that ``quote(string, safe, encoding, errors)`` is equivalent to
|
|
``quote_from_bytes(string.encode(encoding, errors), safe)``.
|
|
|
|
Example: ``quote('/El Niño/')`` yields ``'/El%20Ni%C3%B1o/'``.
|
|
|
|
|
|
.. function:: quote_plus(string, safe='', encoding=None, errors=None)
|
|
|
|
Like :func:`quote`, but also replace spaces by plus signs, as required for
|
|
quoting HTML form values when building up a query string to go into a URL.
|
|
Plus signs in the original string are escaped unless they are included in
|
|
*safe*. It also does not have *safe* default to ``'/'``.
|
|
|
|
Example: ``quote_plus('/El Niño/')`` yields ``'%2FEl+Ni%C3%B1o%2F'``.
|
|
|
|
|
|
.. function:: quote_from_bytes(bytes, safe='/')
|
|
|
|
Like :func:`quote`, but accepts a :class:`bytes` object rather than a
|
|
:class:`str`, and does not perform string-to-bytes encoding.
|
|
|
|
Example: ``quote_from_bytes(b'a&\xef')`` yields
|
|
``'a%26%EF'``.
|
|
|
|
|
|
.. function:: unquote(string, encoding='utf-8', errors='replace')
|
|
|
|
Replace ``%xx`` escapes by their single-character equivalent.
|
|
The optional *encoding* and *errors* parameters specify how to decode
|
|
percent-encoded sequences into Unicode characters, as accepted by the
|
|
:meth:`bytes.decode` method.
|
|
|
|
*string* must be a :class:`str`.
|
|
|
|
*encoding* defaults to ``'utf-8'``.
|
|
*errors* defaults to ``'replace'``, meaning invalid sequences are replaced
|
|
by a placeholder character.
|
|
|
|
Example: ``unquote('/El%20Ni%C3%B1o/')`` yields ``'/El Niño/'``.
|
|
|
|
|
|
.. function:: unquote_plus(string, encoding='utf-8', errors='replace')
|
|
|
|
Like :func:`unquote`, but also replace plus signs by spaces, as required for
|
|
unquoting HTML form values.
|
|
|
|
*string* must be a :class:`str`.
|
|
|
|
Example: ``unquote_plus('/El+Ni%C3%B1o/')`` yields ``'/El Niño/'``.
|
|
|
|
|
|
.. function:: unquote_to_bytes(string)
|
|
|
|
Replace ``%xx`` escapes by their single-octet equivalent, and return a
|
|
:class:`bytes` object.
|
|
|
|
*string* may be either a :class:`str` or a :class:`bytes`.
|
|
|
|
If it is a :class:`str`, unescaped non-ASCII characters in *string*
|
|
are encoded into UTF-8 bytes.
|
|
|
|
Example: ``unquote_to_bytes('a%26%EF')`` yields
|
|
``b'a&\xef'``.
|
|
|
|
|
|
.. function:: urlencode(query, doseq=False)
|
|
|
|
Convert a mapping object or a sequence of two-element tuples to a "url-encoded"
|
|
string, suitable to pass to :func:`urlopen` above as the optional *data*
|
|
argument. This is useful to pass a dictionary of form fields to a ``POST``
|
|
request. The resulting string is a series of ``key=value`` pairs separated by
|
|
``'&'`` characters, where both *key* and *value* are quoted using
|
|
:func:`quote_plus` above. If the optional parameter *doseq* is present and
|
|
evaluates to true, individual ``key=value`` pairs are generated for each element
|
|
of the sequence. When a sequence of two-element tuples is used as the *query*
|
|
argument, the first element of each tuple is a key and the second is a value.
|
|
The order of parameters in the encoded string will match the order of parameter
|
|
tuples in the sequence. This module provides the functions
|
|
:func:`parse_qs` and :func:`parse_qsl` which are used to parse query strings
|
|
into Python data structures.
|
|
|
|
|
|
.. seealso::
|
|
|
|
:rfc:`1738` - Uniform Resource Locators (URL)
|
|
This specifies the formal syntax and semantics of absolute URLs.
|
|
|
|
:rfc:`1808` - Relative Uniform Resource Locators
|
|
This Request For Comments includes the rules for joining an absolute and a
|
|
relative URL, including a fair number of "Abnormal Examples" which govern the
|
|
treatment of border cases.
|
|
|
|
:rfc:`2396` - Uniform Resource Identifiers (URI): Generic Syntax
|
|
Document describing the generic syntactic requirements for both Uniform Resource
|
|
Names (URNs) and Uniform Resource Locators (URLs).
|
|
|
|
|
|
.. _urlparse-result-object:
|
|
|
|
Results of :func:`urlparse` and :func:`urlsplit`
|
|
------------------------------------------------
|
|
|
|
The result objects from the :func:`urlparse` and :func:`urlsplit` functions are
|
|
subclasses of the :class:`tuple` type. These subclasses add the attributes
|
|
described in those functions, as well as provide an additional method:
|
|
|
|
.. method:: ParseResult.geturl()
|
|
|
|
Return the re-combined version of the original URL as a string. This may differ
|
|
from the original URL in that the scheme will always be normalized to lower case
|
|
and empty components may be dropped. Specifically, empty parameters, queries,
|
|
and fragment identifiers will be removed.
|
|
|
|
The result of this method is a fixpoint if passed back through the original
|
|
parsing function:
|
|
|
|
>>> import urllib.parse
|
|
>>> url = 'HTTP://www.Python.org/doc/#'
|
|
|
|
>>> r1 = urllib.parse.urlsplit(url)
|
|
>>> r1.geturl()
|
|
'http://www.Python.org/doc/'
|
|
|
|
>>> r2 = urllib.parse.urlsplit(r1.geturl())
|
|
>>> r2.geturl()
|
|
'http://www.Python.org/doc/'
|
|
|
|
|
|
The following classes provide the implementations of the parse results:
|
|
|
|
.. class:: BaseResult
|
|
|
|
Base class for the concrete result classes. This provides most of the
|
|
attribute definitions. It does not provide a :meth:`geturl` method. It is
|
|
derived from :class:`tuple`, but does not override the :meth:`__init__` or
|
|
:meth:`__new__` methods.
|
|
|
|
|
|
.. class:: ParseResult(scheme, netloc, path, params, query, fragment)
|
|
|
|
Concrete class for :func:`urlparse` results. The :meth:`__new__` method is
|
|
overridden to support checking that the right number of arguments are passed.
|
|
|
|
|
|
.. class:: SplitResult(scheme, netloc, path, query, fragment)
|
|
|
|
Concrete class for :func:`urlsplit` results. The :meth:`__new__` method is
|
|
overridden to support checking that the right number of arguments are passed.
|
|
|