Commit graph

210 commits

Author SHA1 Message Date
Miss Islington (bot)
bf189d7ab5
[3.14] gh-127146: Emscripten: Skip test_url2pathname_resolve_host (GH-135634) (#135651)
Emscripten currently `gethostbyname_r()` returns an incorrect
IP address for `localhost`. Will be resolved by upstream PR:
https://github.com/emscripten-core/emscripten/pull/24593
(cherry picked from commit 2a49c54ab2)

Co-authored-by: Hood Chatham <roberthoodchatham@gmail.com>
2025-06-18 03:23:31 +00:00
Miss Islington (bot)
c11fc4bc96
[3.14] gh-133677: Fix tests when running in non-UTF-8 locale (GH-133865) (GH-133938)
(cherry picked from commit 14305a83d3)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2025-05-12 16:41:40 +00:00
Barney Gale
8e08ac9f32
GH-123599: url2pathname(): don't call gethostbyname() by default (#132610)
Follow-up to 66cdb2bd8a.

Add *resolve_host* keyword-only argument to `url2pathname()`, defaulting to
false. When set to true, we call `socket.gethostbyname()` to resolve the
URL hostname.

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Steve Dower <steve.dower@microsoft.com>
2025-05-05 17:03:42 +00:00
Barney Gale
f3192dac66
GH-90812: Add test for urlopen() of file URI for UNC path (#132489) 2025-04-15 19:16:34 +01:00
Serhiy Storchaka
f98b9b4cbb
gh-71339: Use new assertion methods in the urllib tests (GH-129056) 2025-04-14 09:24:41 +03:00
Barney Gale
ccad61e35d
GH-125866: Support complete "file:" URLs in urllib (#132378)
Add optional *add_scheme* argument to `urllib.request.pathname2url()`; when
set to true, a complete URL is returned. Likewise add optional
*require_scheme* argument to `url2pathname()`; when set to true, a complete
URL is accepted.

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
2025-04-14 01:49:02 +01:00
Barney Gale
66cdb2bd8a
GH-123599: url2pathname(): handle authority section in file URL (#126844)
In `urllib.request.url2pathname()`, if the authority resolves to the
current host, discard it. If an authority is present but resolves somewhere
else, then on Windows we return a UNC path (as before), and on other
platforms we raise `URLError`.

Affects `pathlib.Path.from_uri()` in the same way.

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
2025-04-10 19:58:04 +00:00
Barney Gale
d783d7b51d
GH-126367: url2pathname(): handle NTFS alternate data streams (#131428)
Adjust `url2pathname()` to decode embedded colon characters in Windows
URIs, rather than bailing out with an `OSError`.
2025-03-18 23:37:12 +00:00
Serhiy Storchaka
5ace71713b
gh-128734: Fix ResourceWarning in urllib tests (GH-128735) 2025-01-12 12:53:17 +02:00
Barney Gale
79b7cab50a
GH-127090: Fix urllib.response.addinfourl.url value for opened file: URIs (#127091)
The canonical `file:` URL (as generated by `pathname2url()`) is now used as the `url` attribute of the returned `addinfourl` object. The `addinfourl.url` attribute reflects the resolved URL for both `file:` or `http[s]:` URLs now.
2024-12-07 17:58:42 +00:00
Barney Gale
5bb059fe60
GH-127236: pathname2url(): generate RFC 1738 URL for absolute POSIX path (#127194)
When handed an absolute Windows path such as `C:\foo` or `//server/share`,
the `urllib.request.pathname2url()` function returns a URL with an
authority section, such as `///C:/foo` or `//server/share` (or before
GH-126205, `////server/share`). Only the `file:` prefix is omitted.

But when handed an absolute POSIX path such as `/etc/hosts`, or a Windows
path of the same form (rooted but lacking a drive), the function returns a
URL without an authority section, such as `/etc/hosts`.

This patch corrects the discrepancy by adding a `//` prefix before
drive-less, rooted paths when generating URLs.
2024-11-25 19:59:20 +00:00
Serhiy Storchaka
97b2ceaaaf
gh-127217: Fix pathname2url() for paths starting with multiple slashes on Posix (GH-127218) 2024-11-24 19:30:29 +02:00
Barney Gale
cc813e10ff
GH-125866: Preserve Windows drive letter case in file URIs (#127138)
Stop converting Windows drive letters to uppercase in
`urllib.request.pathname2url()` and `url2pathname()`. This behaviour is
unnecessary and inconsistent with pathlib's file URI implementation.
2024-11-23 10:41:39 +00:00
Barney Gale
8c98ed846a
GH-127078: url2pathname(): handle extra slash before UNC drive in URL path (#127132)
Decode a file URI like `file://///server/share` as a UNC path like
`\\server\share`. This form of file URI is created by software the simply
prepends `file:///` to any absolute Windows path.
2024-11-22 04:12:50 +00:00
Barney Gale
ebf564a1d3
GH-126766: url2pathname(): handle 'localhost' authority (#127129)
Discard any 'localhost' authority from the beginning of a `file:` URI. As a
result, file URIs like `//localhost/etc/hosts` are correctly decoded as
`/etc/hosts`.
2024-11-22 03:17:06 +00:00
Barney Gale
fd133d4f21
GH-126601: pathname2url(): handle NTFS alternate data streams (#126760)
Adjust `pathname2url()` to encode embedded colon characters in Windows
paths, rather than bailing out with an `OSError`.

Co-authored-by: Steve Dower <steve.dower@microsoft.com>
2024-11-22 00:29:05 +00:00
Barney Gale
c9b399fbdb
GH-85168: Use filesystem encoding when converting to/from file URIs (#126852)
Adjust `urllib.request.url2pathname()` and `pathname2url()` to use the
filesystem encoding when quoting and unquoting file URIs, rather than
forcing use of UTF-8.

No changes are needed in the `nturl2path` module because Windows always
uses UTF-8, per PEP 529.
2024-11-19 21:19:30 +00:00
Barney Gale
4d771977b1
GH-84850: Remove urllib.request.URLopener and FancyURLopener (#125739) 2024-11-19 16:01:49 +02:00
Barney Gale
cae9d9d20f
GH-126766: url2pathname(): handle empty authority section. (#126767)
Discard two leading slashes from the beginning of a `file:` URI if they
introduce an empty authority section. As a result, file URIs like
`///etc/hosts` are correctly parsed as `/etc/hosts`.
2024-11-14 20:22:14 +00:00
Barney Gale
bf224bd7ce
GH-120423: pathname2url(): handle forward slashes in Windows paths (#126593)
Adjust `urllib.request.pathname2url()` so that forward slashes in Windows
paths are handled identically to backward slashes.
2024-11-12 19:52:30 +00:00
Barney Gale
54c63a32d0
GH-126212: Fix removal of slashes in file URIs on Windows (#126214)
Adjust `urllib.request.pathname2url()` and `url2pathname()` so that they
don't remove slashes from Windows DOS drive paths and URLs. There was no
basis for this behaviour, and it conflicts with how UNC and POSIX paths are
handled.
2024-11-08 16:47:51 +00:00
Barney Gale
951cb2c369
GH-126205: Fix conversion of UNC paths to file URIs (#126208)
File URIs for Windows UNC paths should begin with two slashes, not four.
2024-10-30 22:56:58 +00:00
Barney Gale
6742f14dfd
GH-125866: Improve tests for pathname2url() and url2pathname() (#125993)
Merge `URL2PathNameTests` and `PathName2URLTests` test cases (which test
only the Windows-specific implementations from `nturl2path`) into the main
`Pathname_Tests` test case for these functions.

Copy/port some test cases for `pathlib.Path.as_uri()` and `from_uri()`.
2024-10-29 20:44:57 +00:00
Victor Stinner
2587b9f64e
gh-105382: Remove urllib.request cafile parameter (#105384)
Remove cafile, capath and cadefault parameters of the
urllib.request.urlopen() function, deprecated in Python 3.6.
2023-06-06 21:17:45 +00:00
Gregory P. Smith
2e279e85fe
gh-88500: Reduce memory use of urllib.unquote (#96763)
`urllib.unquote_to_bytes` and `urllib.unquote` could both potentially generate `O(len(string))` intermediate `bytes` or `str` objects while computing the unquoted final result depending on the input provided. As Python objects are relatively large, this could consume a lot of ram.

This switches the implementation to using an expanding `bytearray` and a generator internally instead of precomputed `split()` style operations.

Microbenchmarks with some antagonistic inputs like `mess = "\u0141%%%20a%fe"*1000` show this is 10-20% slower for unquote and unquote_to_bytes and no different for typical inputs that are short or lack much unicode or % escaping. But the functions are already quite fast anyways so not a big deal.  The slowdown scales consistently linear with input size as expected.

Memory usage observed manually using `/usr/bin/time -v` on `python -m timeit` runs of larger inputs. Unittesting memory consumption is difficult and does not seem worthwhile.

Observed memory usage is ~1/2 for `unquote()` and <1/3 for `unquote_to_bytes()` using `python -m timeit -s 'from urllib.parse import unquote, unquote_to_bytes; v="\u0141%01\u0161%20"*500_000' 'unquote_to_bytes(v)'` as a test.
2022-12-10 16:17:39 -08:00
Christian Heimes
760ec8940a
gh-90473: WASI: skip gethostname tests (GH-93092)
- WASI's ``gethostname()`` is a stub that always fails with OSError
  ``ENOTSUP``
- skip mailcap ``test`` if subprocess is not available
- WASI process_time clock does not work.
2022-05-23 10:39:57 +02:00
Serhiy Storchaka
086c6b1b0f
bpo-45046: Support context managers in unittest (GH-28045)
Add methods enterContext() and enterClassContext() in TestCase.
Add method enterAsyncContext() in IsolatedAsyncioTestCase.
Add function enterModuleContext().
2022-05-08 17:49:09 +03:00
Steve Dower
3513d55a61
bpo-43607: Fix urllib handling of Windows paths with \\?\ prefix (GH-25539) 2021-04-23 18:02:47 +01:00
Hai Shi
3ddc634cd5
bpo-40275: Use new test.support helper submodules in tests (GH-21219) 2020-06-30 15:46:06 +02:00
Serhiy Storchaka
700cfa8c90
bpo-41069: Make TESTFN and the CWD for tests containing non-ascii characters. (GH-21035) 2020-06-25 17:56:31 +03:00
Ashwin Ramaswami
9165addc22
bpo-38576: Disallow control characters in hostnames in http.client (GH-18995)
Add host validation for control characters for more CVE-2019-18348 protection.
2020-03-14 11:56:06 -07:00
Serhiy Storchaka
6a265f0d0c
bpo-39057: Fix urllib.request.proxy_bypass_environment(). (GH-17619)
Ignore leading dots and no longer ignore a trailing newline.
2020-01-05 14:14:31 +02:00
Victor Stinner
ae7aa42774
Remove code commented for more than 10 years (GH-16965)
test_urllib commented since 2007:

commit d9880d07fc
Author: Facundo Batista <facundobatista@gmail.com>
Date:   Fri May 25 04:20:22 2007 +0000

    Commenting out the tests until find out who can test them in
    one of the problematic enviroments.

pynche code commented since 1998 and 2001:

commit ef30092207
Author: Barry Warsaw <barry@python.org>
Date:   Tue Dec 15 01:04:38 1998 +0000

    Added most of the mechanism to change the strips from color variations
    to color constants (i.e. red constant, green constant, blue
    constant).  But I haven't hooked this up yet because the UI gets more
    crowded and the arrows don't reflect the correct values.

    Added "Go to Black" and "Go to White" buttons.

commit 741eae0b31
Author: Barry Warsaw <barry@python.org>
Date:   Wed Apr 18 03:51:55 2001 +0000

    StripWidget.__init__(), update_yourself(): Removed some unused local
    variables reported by PyChecker.

    __togglegentype(): PyChecker accurately reported that the variable
    __gentypevar was unused -- actually this whole method is currently
    unused so comment it out.
2019-10-28 22:35:31 +01:00
Stein Karlsen
aad2ee0156 bpo-32498: urllib.parse.unquote also accepts bytes (GH-7768) 2019-10-14 13:36:29 +03:00
Ashwin Ramaswami
ff2e182865 bpo-12707: deprecate info(), geturl(), getcode() methods in favor of headers, url, and status properties for HTTPResponse and addinfourl (GH-11447)
Co-Authored-By: epicfaace <aramaswamis@gmail.com>
2019-09-13 12:40:07 +01:00
Victor Stinner
7cb9204ee1
bpo-37421: urllib.request tests call urlcleanup() (GH-14529)
urllib.request tests now call urlcleanup() to remove temporary files
created by urlretrieve() tests and to clear the _opener global
variable set by urlopen() and functions calling indirectly urlopen().

regrtest now checks if urllib.request._url_tempfiles and
urllib.request._opener are changed by tests.
2019-07-02 14:50:19 +02:00
Victor Stinner
eb976e47e2
bpo-36918: Fix "Exception ignored in" in test_urllib (GH-13996)
Mock the HTTPConnection.close() method in a few unit tests to avoid
logging "Exception ignored in: ..." messages.
2019-06-12 04:07:38 +02:00
Victor Stinner
0c2b6a3943
bpo-35907, CVE-2019-9948: urllib rejects local_file:// scheme (GH-13474)
CVE-2019-9948: Avoid file reading as disallowing the unnecessary URL
scheme in URLopener().open() and URLopener().retrieve()
of urllib.request.

Co-Authored-By: SH <push0ebp@gmail.com>
2019-05-22 22:15:01 +02:00
Berker Peksag
2725cb01d7
bpo-36948: Fix test_urlopener_retrieve_file on Windows (GH-13476) 2019-05-22 02:00:35 +03:00
Xtreak
c661b30f89 bpo-36948: Fix NameError in urllib.request.URLopener.retrieve (GH-13389) 2019-05-19 16:40:05 +03:00
Gregory P. Smith
b7378d7728
bpo-30458: Use InvalidURL instead of ValueError. (GH-13044)
Use http.client.InvalidURL instead of ValueError as the new error case's exception.
2019-05-01 16:39:21 -04:00
Xtreak
2fc936ed24 bpo-30458: Disable https related urllib tests on a build without ssl (GH-13032)
These tests require an SSL enabled build. Skip these tests when python is built without SSL to fix test failures.


https://bugs.python.org/issue30458
2019-05-01 04:59:48 -07:00
Gregory P. Smith
c4e671eec2
bpo-30458: Disallow control chars in http URLs. (GH-12755)
Disallow control chars in http URLs in urllib.urlopen.  This addresses a potential security problem for applications that do not sanity check their URLs where http request headers could be injected.
2019-04-30 19:12:21 -07:00
Stéphane Wirtel
a40681dd5d bpo-36019: Use pythontest.net instead of example.com in network tests (GH-11941) 2019-02-22 14:45:36 +01:00
Senthil Kumaran
efbd4ea65d Minor spell fix and formatting fixes in urllib tests. (#959) 2017-04-01 23:47:35 -07:00
Ratnadeep Debnath
21024f0662 bpo-16285: Update urllib quoting to RFC 3986 (#173)
* bpo-16285: Update urllib quoting to RFC 3986

urllib.parse.quote is now based on RFC 3986, and hence
includes `'~'` in the set of characters that is not escaped
by default.

Patch by Christian Theune and Ratnadeep Debnath.
2017-02-25 19:00:28 +10:00
Xiang Zhang
c44d58a77a Issue #29142: Merge 3.5. 2017-01-09 11:50:02 +08:00
Xiang Zhang
959ff7f1c6 Issue #29142: Fix suffixes in no_proxy handling in urllib.
In urllib.request, suffixes in no_proxy environment variable with
leading dots could match related hostnames again (e.g. .b.c matches a.b.c).
Patch by Milan Oberkirch.
2017-01-09 11:47:55 +08:00
Christian Heimes
d04863771b Issue #28022: Deprecate ssl-related arguments in favor of SSLContext.
The deprecation include manual creation of SSLSocket and certfile/keyfile
(or similar) in ftplib, httplib, imaplib, smtplib, poplib and urllib.

ssl.wrap_socket() is not marked as deprecated yet.
2016-09-10 23:23:33 +02:00
Martin Panter
0be894b2f6 Issue #27895: Spelling fixes (Contributed by Ville Skyttä). 2016-09-07 12:03:06 +00:00