cpython

mirror of https://github.com/python/cpython.git synced 2025-07-26 20:54:39 +00:00

Author	SHA1	Message	Date
Gregory P. Smith	2e279e85fe	gh-88500: Reduce memory use of `urllib.unquote` (#96763 ) `urllib.unquote_to_bytes` and `urllib.unquote` could both potentially generate `O(len(string))` intermediate `bytes` or `str` objects while computing the unquoted final result depending on the input provided. As Python objects are relatively large, this could consume a lot of ram. This switches the implementation to using an expanding `bytearray` and a generator internally instead of precomputed `split()` style operations. Microbenchmarks with some antagonistic inputs like `mess = "\u0141%%%20a%fe"1000` show this is 10-20% slower for unquote and unquote_to_bytes and no different for typical inputs that are short or lack much unicode or % escaping. But the functions are already quite fast anyways so not a big deal. The slowdown scales consistently linear with input size as expected. Memory usage observed manually using `/usr/bin/time -v` on `python -m timeit` runs of larger inputs. Unittesting memory consumption is difficult and does not seem worthwhile. Observed memory usage is ~1/2 for `unquote()` and <1/3 for `unquote_to_bytes()` using `python -m timeit -s 'from urllib.parse import unquote, unquote_to_bytes; v="\u0141%01\u0161%20"500_000' 'unquote_to_bytes(v)'` as a test.	2022-12-10 16:17:39 -08:00
Ben Kallus	439b9cfaf4	gh-99418: Make urllib.parse.urlparse enforce that a scheme must begin with an alphabetical ASCII character. (#99421 ) Prevent urllib.parse.urlparse from accepting schemes that don't begin with an alphabetical ASCII character. RFC 3986 defines a scheme like this: `scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )` RFC 2234 defines an ALPHA like this: `ALPHA = %x41-5A / %x61-7A` The WHATWG URL spec defines a scheme like this: `"A URL-scheme string must be one ASCII alpha, followed by zero or more of ASCII alphanumeric, U+002B (+), U+002D (-), and U+002E (.)."`	2022-11-13 10:25:55 -08:00
Ben Kallus	6f15ca8c7a	gh-96035: Make urllib.parse.urlparse reject non-numeric ports (#98273 ) Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>	2022-10-20 14:00:56 -07:00
Gregory P. Smith	e61ca22431	gh-95865: Further reduce quote_from_bytes memory consumption (#96860 ) on large input values. Based on Dennis Sweeney's chunking idea.	2022-09-19 16:06:25 -07:00
Dennis Sweeney	8ba22b90ca	gh-95865: Speed up urllib.parse.quote_from_bytes() (GH-95872)	2022-08-30 21:39:51 -04:00
Victor Stinner	259dd71c32	gh-84623: Remove unused imports in stdlib (#93773 )	2022-06-13 16:28:41 +02:00
Oleg Iarygin	a03a09e068	Replace with_traceback() with exception chaining and reraising (GH-32074)	2022-03-30 15:28:20 +03:00
Christian Sattler	e6fe10d340	bpo-45874: Handle empty query string correctly in urllib.parse.parse_qsl (#29716 )	2021-12-12 10:41:12 +02:00
Gregory P. Smith	d597fdc5fd	bpo-44002: Switch to lru_cache in urllib.parse. (GH-25798) Switch to lru_cache in urllib.parse. urllib.parse now uses functool.lru_cache for its internal URL splitting and quoting caches instead of rolling its own like its the 90s. The undocumented internal Quoted class API is now deprecated as it had no reason to be public and no existing OSS users were found. The clear_cache() API remains undocumented but gets an explicit test as it is used in a few projects' (twisted, gevent) tests as well as our own regrtest.	2021-05-11 17:01:44 -07:00
Senthil Kumaran	985ac01637	bpo-43882 Remove the newline, and tab early. From query and fragments. (GH-25921)	2021-05-05 15:50:05 -07:00
Dong-hee Na	6143fcdf8b	bpo-43979: Remove unnecessary operation from urllib.parse.parse_qsl (GH-25756) Automerge-Triggered-By: GH:gpshead	2021-04-30 12:01:55 -07:00
Senthil Kumaran	76cd81d603	bpo-43882 - urllib.parse should sanitize urls containing ASCII newline and tabs. (GH-25595) * issue43882 - urllib.parse should sanitize urls containing ASCII newline and tabs. Co-authored-by: Gregory P. Smith <greg@krypto.org> Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>	2021-04-29 10:16:50 -07:00
Ken Jin	b38601d496	bpo-42967: coerce bytes separator to string in urllib.parse_qs(l) (#24818 ) * coerce bytes separator to string * Add news * Update Misc/NEWS.d/next/Library/2021-03-11-00-31-41.bpo-42967.2PeQRw.rst	2021-04-11 06:26:09 -07:00
Ken Jin	a2f0654b0a	bpo-42967: Fix urllib.parse docs and make logic clearer (GH-24536)	2021-02-15 09:00:20 -08:00
Adam Goldschmidt	fcbe0cb04d	bpo-42967: only use '&' as a query string separator (#24297 ) bpo-42967: [security] Address a web cache-poisoning issue reported in urllib.parse.parse_qsl(). urllib.parse will only us "&" as query string separator by default instead of both ";" and "&" as allowed in earlier versions. An optional argument seperator with default value "&" is added to specify the separator. Co-authored-by: Éric Araujo <merwok@netwok.org> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com> Co-authored-by: Éric Araujo <merwok@netwok.org>	2021-02-14 14:41:57 -08:00
Batuhan Taşkaya	0361556537	bpo-39481: PEP 585 for a variety of modules (GH-19423) - concurrent.futures - ctypes - http.cookies - multiprocessing - queue - tempfile - unittest.case - urllib.parse	2020-04-10 07:46:36 -07:00
idomic	c33bdbb20c	bpo-37970: update and improve urlparse and urlsplit doc-strings (GH-16458)	2020-02-16 21:17:58 +02:00
Serhiy Storchaka	6a265f0d0c	bpo-39057: Fix urllib.request.proxy_bypass_environment(). (GH-17619) Ignore leading dots and no longer ignore a trailing newline.	2020-01-05 14:14:31 +02:00
Tim Graham	5a88d50ff0	bpo-27657: Fix urlparse() with numeric paths (#661 ) * bpo-27657: Fix urlparse() with numeric paths Revert parsing decision from bpo-754016 in favor of the documented consensus in bpo-16932 of how to treat strings without a // to designate the netloc. * bpo-22891: Remove urlsplit() optimization for 'http' prefixed inputs.	2019-10-18 06:07:20 -07:00
Stein Karlsen	aad2ee0156	bpo-32498: urllib.parse.unquote also accepts bytes (GH-7768)	2019-10-14 13:36:29 +03:00
Steve Dower	8d0ef0b5ed	bpo-36742: Corrects fix to handle decomposition in usernames (#13812 )	2019-06-04 17:55:29 +02:00
Rémi Lapeyre	674ee12600	bpo-35397: Remove deprecation and document urllib.parse.unwrap (GH-11481)	2019-05-27 09:43:45 -04:00
Steve Dower	d537ab0ff9	bpo-36742: Fixes handling of pre-normalization characters in urlsplit() (GH-13017)	2019-04-30 12:03:02 +00:00
Jörn Hees	750d74fac5	bpo-12910: update and correct quote docstring (#2568 ) Fixes some mistakes and misleadings in the quote function docstring: - reserved chars are never actually used by quote code, unreserved chars are - reserved chars were wrong and incomplete - mentioned that use-case is not minimal quoting wrt. RFC, but cautious quoting	2019-04-09 17:31:18 -07:00
Steve Dower	16e6f7dee7	bpo-36216: Add check for characters in netloc that normalize to separators (GH-12201)	2019-03-07 08:02:26 -08:00
matthewbelisle-wf	209144831b	bpo-34866: Adding max_num_fields to cgi.FieldStorage (GH-9660) Adding `max_num_fields` to `cgi.FieldStorage` to make DOS attacks harder by limiting the number of `MiniFieldStorage` objects created by `FieldStorage`.	2018-10-19 03:52:59 -07:00
Cheryl Sabella	0250de4819	bpo-27485: Rename and deprecate undocumented functions in urllib.parse (GH-2205)	2018-04-25 16:51:54 -07:00
Matt Eaton	2cb4661707	bpo-33034: Improve exception message when cast fails for {Parse,Split}Result.port (GH-6078)	2018-03-20 09:41:37 +03:00
Коренберг Марк	fbd605151f	bpo-32323: urllib.parse.urlsplit() must not lowercase() IPv6 scope value (#4867 )	2017-12-21 14:16:17 +02:00
Oren Milman	8df44ee8e0	remove a redundant lower in urllib.parse.urlsplit (#3008 )	2017-09-02 21:51:39 -07:00
postmasters	90e01e50ef	urllib: Simplify splithost by calling into urlparse. (#1849 ) The current regex based splitting produces a wrong result. For example:: http://abc#@def Web browsers parse that URL as ``http://abc/#@def``, that is, the host is ``abc``, the path is ``/``, and the fragment is ``#@def``.	2017-06-20 15:02:44 +02:00
Senthil Kumaran	906f5330b9	bpo-29976: urllib.parse clarify '' in scheme values. (GH-984)	2017-05-17 21:48:59 -07:00
Senthil Kumaran	257b980b31	correct parse_qs and parse_qsl test case descriptions. (#968 ) * correct parse_qs and parse_qsl test case descriptions.	2017-04-04 21:19:43 -07:00
Ratnadeep Debnath	21024f0662	bpo-16285: Update urllib quoting to RFC 3986 (#173 ) * bpo-16285: Update urllib quoting to RFC 3986 urllib.parse.quote is now based on RFC 3986, and hence includes `'~'` in the set of characters that is not escaped by default. Patch by Christian Theune and Ratnadeep Debnath.	2017-02-25 19:00:28 +10:00
Serhiy Storchaka	8cbd3df3ce	Issue #28992 : Use bytes.fromhex().	2016-12-21 12:59:28 +02:00
Berker Peksag	f8479eeb34	Issue #25895 : Merge from 3.5	2016-09-16 14:45:15 +03:00
Berker Peksag	f676748a05	Issue #25895 : Enable WebSocket URL schemes in urllib.parse.urljoin Patch by Gergely Imreh and Markus Holtermann.	2016-09-16 14:43:58 +03:00
Senthil Kumaran	0b57f0adde	merge from 3.5 Remove unnecessary test case comment in urllib.parse.py. These are asserted as test cases.	2016-01-25 18:54:37 -08:00
Senthil Kumaran	d4e51f45a9	Remove unnecessary test case comment in urllib.parse.py. These are asserted as test cases.	2016-01-25 18:53:34 -08:00
Senthil Kumaran	86f7109dad	Issue #25822 : Add docstrings to the fields of urllib.parse results. Patch contributed by Swati Jaiswal.	2016-01-14 00:11:39 -08:00
Robert Collins	dfa95c9a8f	Issue #20059 : urllib.parse raises ValueError on all invalid ports. Patch by Martin Panter.	2015-08-10 09:53:30 +12:00
R David Murray	c17686f071	Issue #13866 : add quote_via argument to urlencode. Patch by samwyse, completed by Arnon Yaari, and reviewed by Martin Panter.	2015-05-17 20:44:50 -04:00
Berker Peksag	20416f7994	Issue #23703 : Fix a regression in urljoin() introduced in 901e4e52b20a. Patch by Demian Brecht.	2015-04-16 02:31:14 +03:00
Serhiy Storchaka	1515450440	Issue #23411 : Added DefragResult, ParseResult, SplitResult, DefragResultBytes, ParseResultBytes, and SplitResultBytes to urllib.parse.__all__. Patch by Martin Panter.	2015-04-07 19:09:01 +03:00
Serhiy Storchaka	44eceb6e2a	Issue #23563 : Optimized utility functions in urllib.parse.	2015-03-03 20:21:35 +02:00
R David Murray	3ab6ba4744	Merge: #23040 : Clarify treatment of encoding and errors when component is bytes.	2014-12-24 21:24:07 -05:00
R David Murray	8c4e112afc	#23040 : Clarify treatment of encoding and errors when component is bytes. Patch by Wojtek Ruszczewski.	2014-12-24 21:23:18 -05:00
Senthil Kumaran	a66e3885fb	Issue #22278 : Fix urljoin problem with relative urls, a regression observed after changes to issue22118 were submitted. Patch contributed by Demian Brecht and reviewed by Antoine Pitrou.	2014-09-22 15:49:16 +08:00
Antoine Pitrou	55ac5b3f7b	Issue #22118 : Switch urllib.parse to use RFC 3986 semantics for the resolution of relative URLs, rather than RFCs 1808 and 2396. Patch by Demian Brecht.	2014-08-21 19:16:17 -04:00
Serhiy Storchaka	465e60e654	Issue #22033 : Reprs of most Python implemened classes now contain actual class name instead of hardcoded one.	2014-07-25 23:36:00 +03:00

1 2 3

125 commits