mirror of
https://github.com/python/cpython.git
synced 2025-07-24 03:35:53 +00:00
[3.11] gh-102153: Start stripping C0 control and space chars in urlsplit
(GH-102508) (#104575)
* gh-102153: Start stripping C0 control and space chars in `urlsplit` (GH-102508)
`urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit GH-25595.
This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329).
---------
(cherry picked from commit 2f630e1ce1
)
Co-authored-by: Illia Volochii <illia.volochii@gmail.com>
Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>
This commit is contained in:
parent
0560fd3f98
commit
610cc0ab1b
4 changed files with 119 additions and 3 deletions
|
@ -159,6 +159,10 @@ or on combining URL components into a URL string.
|
|||
ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
|
||||
params='', query='', fragment='')
|
||||
|
||||
.. warning::
|
||||
|
||||
:func:`urlparse` does not perform validation. See :ref:`URL parsing
|
||||
security <url-parsing-security>` for details.
|
||||
|
||||
.. versionchanged:: 3.2
|
||||
Added IPv6 URL parsing capabilities.
|
||||
|
@ -324,8 +328,14 @@ or on combining URL components into a URL string.
|
|||
``#``, ``@``, or ``:`` will raise a :exc:`ValueError`. If the URL is
|
||||
decomposed before parsing, no error will be raised.
|
||||
|
||||
Following the `WHATWG spec`_ that updates RFC 3986, ASCII newline
|
||||
``\n``, ``\r`` and tab ``\t`` characters are stripped from the URL.
|
||||
Following some of the `WHATWG spec`_ that updates RFC 3986, leading C0
|
||||
control and space characters are stripped from the URL. ``\n``,
|
||||
``\r`` and tab ``\t`` characters are removed from the URL at any position.
|
||||
|
||||
.. warning::
|
||||
|
||||
:func:`urlsplit` does not perform validation. See :ref:`URL parsing
|
||||
security <url-parsing-security>` for details.
|
||||
|
||||
.. versionchanged:: 3.6
|
||||
Out-of-range port numbers now raise :exc:`ValueError`, instead of
|
||||
|
@ -338,6 +348,9 @@ or on combining URL components into a URL string.
|
|||
.. versionchanged:: 3.10
|
||||
ASCII newline and tab characters are stripped from the URL.
|
||||
|
||||
.. versionchanged:: 3.11.4
|
||||
Leading WHATWG C0 control and space characters are stripped from the URL.
|
||||
|
||||
.. _WHATWG spec: https://url.spec.whatwg.org/#concept-basic-url-parser
|
||||
|
||||
.. function:: urlunsplit(parts)
|
||||
|
@ -414,6 +427,35 @@ or on combining URL components into a URL string.
|
|||
or ``scheme://host/path``). If *url* is not a wrapped URL, it is returned
|
||||
without changes.
|
||||
|
||||
.. _url-parsing-security:
|
||||
|
||||
URL parsing security
|
||||
--------------------
|
||||
|
||||
The :func:`urlsplit` and :func:`urlparse` APIs do not perform **validation** of
|
||||
inputs. They may not raise errors on inputs that other applications consider
|
||||
invalid. They may also succeed on some inputs that might not be considered
|
||||
URLs elsewhere. Their purpose is for practical functionality rather than
|
||||
purity.
|
||||
|
||||
Instead of raising an exception on unusual input, they may instead return some
|
||||
component parts as empty strings. Or components may contain more than perhaps
|
||||
they should.
|
||||
|
||||
We recommend that users of these APIs where the values may be used anywhere
|
||||
with security implications code defensively. Do some verification within your
|
||||
code before trusting a returned component part. Does that ``scheme`` make
|
||||
sense? Is that a sensible ``path``? Is there anything strange about that
|
||||
``hostname``? etc.
|
||||
|
||||
What constitutes a URL is not universally well defined. Different applications
|
||||
have different needs and desired constraints. For instance the living `WHATWG
|
||||
spec`_ describes what user facing web clients such as a web browser require.
|
||||
While :rfc:`3986` is more general. These functions incorporate some aspects of
|
||||
both, but cannot be claimed compliant with either. The APIs and existing user
|
||||
code with expectations on specific behaviors predate both standards leading us
|
||||
to be very cautious about making API behavior changes.
|
||||
|
||||
.. _parsing-ascii-encoded-bytes:
|
||||
|
||||
Parsing ASCII Encoded Bytes
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue