Commit graph

105 commits

Author SHA1 Message Date
Miss Islington (bot)
526617ed68
[3.11] gh-105704: Disallow square brackets ([ and ]) in domain names for parsed URLs (GH-129418) (#129528)
Co-authored-by: Seth Michael Larson <seth@python.org>
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
2025-02-19 14:13:52 +01:00
Serhiy Storchaka
d0e8c100e4
[3.11] gh-67693: Fix urlunparse() and urlunsplit() for URIs with path starting with multiple slashes and no authority (GH-113563) (#119025)
(cherry picked from commit e237b25a4f)
2024-09-04 17:42:58 +02:00
Miss Islington (bot)
eddfdb3e50
[3.11] gh-116764: Fix regressions in urllib.parse.parse_qsl() (GH-116801) (GH-116895)
* Restore support of None and other false values.
* Raise TypeError for non-zero integers and non-empty sequences.

The regressions were introduced in gh-74668
(bdba8ef42b).
(cherry picked from commit 1069a462f6)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2024-03-16 11:04:31 +00:00
Miss Islington (bot)
fa670a59ba
[3.11] gh-74668: Fix support of bytes in urllib.parse.parse_qsl() (GH-115771) (GH-116367)
urllib.parse functions parse_qs() and parse_qsl() now support bytes
arguments containing raw and percent-encoded non-ASCII data.
(cherry picked from commit bdba8ef42b)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2024-03-05 17:52:03 +00:00
Miss Islington (bot)
610cc0ab1b
[3.11] gh-102153: Start stripping C0 control and space chars in urlsplit (GH-102508) (#104575)
* gh-102153: Start stripping C0 control and space chars in `urlsplit` (GH-102508)

`urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit GH-25595.

This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329).

---------

(cherry picked from commit 2f630e1ce1)

Co-authored-by: Illia Volochii <illia.volochii@gmail.com>
Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>
2023-05-17 21:41:25 +00:00
Miss Islington (bot)
b2171a2fd4
[3.11] gh-103848: Adds checks to ensure that bracketed hosts found by urlsplit are of IPv6 or IPvFuture format (GH-103849) (#104349)
gh-103848: Adds checks to ensure that bracketed hosts found by urlsplit are of IPv6 or IPvFuture format (GH-103849)

* Adds checks to ensure that bracketed hosts found by urlsplit are of IPv6 or IPvFuture format

---------

(cherry picked from commit 29f348e232)

Co-authored-by: JohnJamesUtley <81572567+JohnJamesUtley@users.noreply.github.com>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
2023-05-10 06:35:24 +00:00
Miss Islington (bot)
72d356e358
gh-99418: Make urllib.parse.urlparse enforce that a scheme must begin with an alphabetical ASCII character. (GH-99421)
Prevent urllib.parse.urlparse from accepting schemes that don't begin with an alphabetical ASCII character.

RFC 3986 defines a scheme like this: `scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )`
RFC 2234 defines an ALPHA like this: `ALPHA = %x41-5A / %x61-7A`

The WHATWG URL spec defines a scheme like this:
`"A URL-scheme string must be one ASCII alpha, followed by zero or more of ASCII alphanumeric, U+002B (+), U+002D (-), and U+002E (.)."`
(cherry picked from commit 439b9cfaf4)

Co-authored-by: Ben Kallus <49924171+kenballus@users.noreply.github.com>
2022-11-13 11:00:25 -08:00
Miss Islington (bot)
1520f4e45b
gh-96035: Make urllib.parse.urlparse reject non-numeric ports (GH-98273)
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
(cherry picked from commit 6f15ca8c7a)

Co-authored-by: Ben Kallus <49924171+kenballus@users.noreply.github.com>
2022-10-20 14:28:36 -07:00
Jacob Walls
c0f2fcf9bb
Speed up test_urlsplit_normalization (GH-26688) 2021-07-22 10:45:53 +03:00
Gregory P. Smith
d597fdc5fd
bpo-44002: Switch to lru_cache in urllib.parse. (GH-25798)
Switch to lru_cache in urllib.parse.

urllib.parse now uses functool.lru_cache for its internal URL splitting and
quoting caches instead of rolling its own like its the 90s.

The undocumented internal Quoted class API is now deprecated
as it had no reason to be public and no existing OSS users were found.

The clear_cache() API remains undocumented but gets an explicit test as it
is used in a few projects' (twisted, gevent) tests as well as our own regrtest.
2021-05-11 17:01:44 -07:00
Senthil Kumaran
985ac01637
bpo-43882 Remove the newline, and tab early. From query and fragments. (GH-25921) 2021-05-05 15:50:05 -07:00
Senthil Kumaran
76cd81d603
bpo-43882 - urllib.parse should sanitize urls containing ASCII newline and tabs. (GH-25595)
* issue43882 - urllib.parse should sanitize urls containing ASCII newline and tabs.

Co-authored-by: Gregory P. Smith <greg@krypto.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2021-04-29 10:16:50 -07:00
Ken Jin
b38601d496
bpo-42967: coerce bytes separator to string in urllib.parse_qs(l) (#24818)
* coerce bytes separator to string

* Add news

* Update Misc/NEWS.d/next/Library/2021-03-11-00-31-41.bpo-42967.2PeQRw.rst
2021-04-11 06:26:09 -07:00
Adam Goldschmidt
fcbe0cb04d
bpo-42967: only use '&' as a query string separator (#24297)
bpo-42967: [security] Address a web cache-poisoning issue reported in urllib.parse.parse_qsl().

urllib.parse will only us "&" as query string separator by default instead of both ";" and "&" as allowed in earlier versions. An optional argument seperator with default value "&" is added to specify the separator.


Co-authored-by: Éric Araujo <merwok@netwok.org>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com>
Co-authored-by: Éric Araujo <merwok@netwok.org>
2021-02-14 14:41:57 -08:00
Tim Graham
5a88d50ff0 bpo-27657: Fix urlparse() with numeric paths (#661)
* bpo-27657: Fix urlparse() with numeric paths

Revert parsing decision from bpo-754016 in favor of the documented
consensus in bpo-16932 of how to treat strings without a // to
designate the netloc.

* bpo-22891: Remove urlsplit() optimization for 'http' prefixed inputs.
2019-10-18 06:07:20 -07:00
Steve Dower
8d0ef0b5ed bpo-36742: Corrects fix to handle decomposition in usernames (#13812) 2019-06-04 17:55:29 +02:00
Rémi Lapeyre
674ee12600 bpo-35397: Remove deprecation and document urllib.parse.unwrap (GH-11481) 2019-05-27 09:43:45 -04:00
Steve Dower
d537ab0ff9
bpo-36742: Fixes handling of pre-normalization characters in urlsplit() (GH-13017) 2019-04-30 12:03:02 +00:00
Steve Dower
16e6f7dee7
bpo-36216: Add check for characters in netloc that normalize to separators (GH-12201) 2019-03-07 08:02:26 -08:00
Srinivas Thatiparthy (శ్రీనివాస్ తాటిపర్తి)
90d0cfb222 bpo-35202: Remove unused imports in tests. (GH-10561) 2018-11-16 17:32:58 +02:00
matthewbelisle-wf
209144831b bpo-34866: Adding max_num_fields to cgi.FieldStorage (GH-9660)
Adding `max_num_fields` to `cgi.FieldStorage` to make DOS attacks harder by
limiting the number of `MiniFieldStorage` objects created by `FieldStorage`.
2018-10-19 03:52:59 -07:00
Cheryl Sabella
867b825830 bpo-27485: Change urlparse tests to use private methods. (GH-7070) 2018-06-03 17:31:32 +03:00
Cheryl Sabella
0250de4819 bpo-27485: Rename and deprecate undocumented functions in urllib.parse (GH-2205) 2018-04-25 16:51:54 -07:00
Matt Eaton
2cb4661707 bpo-33034: Improve exception message when cast fails for {Parse,Split}Result.port (GH-6078) 2018-03-20 09:41:37 +03:00
Коренберг Марк
fbd605151f bpo-32323: urllib.parse.urlsplit() must not lowercase() IPv6 scope value (#4867) 2017-12-21 14:16:17 +02:00
postmasters
90e01e50ef urllib: Simplify splithost by calling into urlparse. (#1849)
The current regex based splitting produces a wrong result. For example::

  http://abc#@def

Web browsers parse that URL as ``http://abc/#@def``, that is, the host
is ``abc``, the path is ``/``, and the fragment is ``#@def``.
2017-06-20 15:02:44 +02:00
Senthil Kumaran
257b980b31 correct parse_qs and parse_qsl test case descriptions. (#968)
* correct parse_qs and parse_qsl test case descriptions.
2017-04-04 21:19:43 -07:00
Berker Peksag
f8479eeb34 Issue #25895: Merge from 3.5 2016-09-16 14:45:15 +03:00
Berker Peksag
f676748a05 Issue #25895: Enable WebSocket URL schemes in urllib.parse.urljoin
Patch by Gergely Imreh and Markus Holtermann.
2016-09-16 14:43:58 +03:00
Senthil Kumaran
4d4ac5bd02 merge 3.5
issue26775 - Improve test coverage for urllib.parse
Patch contributed by Luiz Poleto.
2016-04-16 07:34:24 -07:00
Senthil Kumaran
e38415e776 issue26775 - Improve test coverage for urllib.parse
Patch contributed by Luiz Poleto.
2016-04-16 07:33:15 -07:00
Robert Collins
dfa95c9a8f Issue #20059: urllib.parse raises ValueError on all invalid ports.
Patch by Martin Panter.
2015-08-10 09:53:30 +12:00
Berker Peksag
a7c781724f Issue #23684: Clarify the return value of the scheme attribute of ParseResult and SplitResult objects.
Patch by Martin Panter.
2015-06-25 23:39:26 +03:00
Berker Peksag
89584c97e4 Issue #23684: Clarify the return value of the scheme attribute of ParseResult and SplitResult objects.
Patch by Martin Panter.
2015-06-25 23:38:48 +03:00
R David Murray
c17686f071 Issue #13866: add *quote_via* argument to urlencode.
Patch by samwyse, completed by Arnon Yaari, and reviewed by
Martin Panter.
2015-05-17 20:44:50 -04:00
Berker Peksag
20416f7994 Issue #23703: Fix a regression in urljoin() introduced in 901e4e52b20a.
Patch by Demian Brecht.
2015-04-16 02:31:14 +03:00
Serhiy Storchaka
1515450440 Issue #23411: Added DefragResult, ParseResult, SplitResult, DefragResultBytes,
ParseResultBytes, and SplitResultBytes to urllib.parse.__all__.
Patch by Martin Panter.
2015-04-07 19:09:01 +03:00
Serhiy Storchaka
5e0fd95e3b Added more tests for urllib.parse utility functions.
These functions are not documented but used in third-party code.
2015-03-02 16:33:08 +02:00
Serhiy Storchaka
9270be7662 Added more tests for urllib.parse utility functions.
These functions are not documented but used in third-party code.
2015-03-02 16:32:29 +02:00
Senthil Kumaran
a66e3885fb Issue #22278: Fix urljoin problem with relative urls, a regression observed
after changes to issue22118 were submitted.

Patch contributed by Demian Brecht and reviewed by Antoine Pitrou.
2014-09-22 15:49:16 +08:00
Antoine Pitrou
55ac5b3f7b Issue #22118: Switch urllib.parse to use RFC 3986 semantics for the resolution of relative URLs, rather than RFCs 1808 and 2396.
Patch by Demian Brecht.
2014-08-21 19:16:17 -04:00
Serhiy Storchaka
5d83d1a814 Issue #20270: urllib.urlparse now supports empty ports. 2014-01-18 18:31:41 +02:00
Serhiy Storchaka
ff97b08d00 Issue #20270: urllib.urlparse now supports empty ports. 2014-01-18 18:30:33 +02:00
Serhiy Storchaka
8f8ec92de8 Issue #19936: Added executable bits or shebang lines to Python scripts which
requires them.  Disable executable bits and shebang lines in test and
benchmark files in order to prevent using a random system python, and in
source files of modules which don't provide command line interface.  Fixed
shebang lines in the unittestgui and checkpip scripts.
2014-01-16 17:33:23 +02:00
Serhiy Storchaka
b992a0e102 Issue #19936: Added executable bits or shebang lines to Python scripts which
requires them.  Disable executable bits and shebang lines in test and
benchmark files in order to prevent using a random system python, and in
source files of modules which don't provide command line interface.  Fixed
shebang line to use python3 executable in the unittestgui script.
2014-01-16 17:15:49 +02:00
R David Murray
f516388de8 #17472: add tests for a couple of untested methods in urllib.urlparse.
Original patch by Daniel Wozniak.
2013-03-21 20:56:51 -04:00
Senthil Kumaran
ed30199e78 Fix issue16713 - tel url parsing with params 2012-12-24 14:00:20 -08:00
Senthil Kumaran
2fc5a50809 Issue #14036: return None when port in urlparse cross 65535 2012-05-24 21:56:17 +08:00
Ezio Melotti
6709b7d5d1 #14072: Fix parsing of tel URIs in urlparse by making the check for ports stricter. 2012-05-19 17:15:19 +03:00
Senthil Kumaran
1be320ebdd Issue9374 - Generic parsing of query and fragment portion of urls for any scheme 2012-05-19 08:12:00 +08:00