GH-127078: `url2pathname()`: handle extra slash before UNC drive in URL path (GH-127132)
Decode a file URI like `file://///server/share` as a UNC path like
`\\server\share`. This form of file URI is created by software the simply
prepends `file:///` to any absolute Windows path.
(cherry picked from commit 8c98ed846a)
Co-authored-by: Barney Gale <barney.gale@gmail.com>
GH-126766: `url2pathname()`: handle 'localhost' authority (GH-127129)
Discard any 'localhost' authority from the beginning of a `file:` URI. As a
result, file URIs like `//localhost/etc/hosts` are correctly decoded as
`/etc/hosts`.
(cherry picked from commit ebf564a1d3)
Co-authored-by: Barney Gale <barney.gale@gmail.com>
GH-85168: Use filesystem encoding when converting to/from `file` URIs (GH-126852)
Adjust `urllib.request.url2pathname()` and `pathname2url()` to use the
filesystem encoding when quoting and unquoting file URIs, rather than
forcing use of UTF-8.
No changes are needed in the `nturl2path` module because Windows always
uses UTF-8, per PEP 529.
(cherry picked from commit c9b399fbdb)
Co-authored-by: Barney Gale <barney.gale@gmail.com>
GH-126766: `url2pathname()`: handle empty authority section. (GH-126767)
Discard two leading slashes from the beginning of a `file:` URI if they
introduce an empty authority section. As a result, file URIs like
`///etc/hosts` are correctly parsed as `/etc/hosts`.
(cherry picked from commit cae9d9d20f)
Co-authored-by: Barney Gale <barney.gale@gmail.com>
GH-120423: `pathname2url()`: handle forward slashes in Windows paths (GH-126593)
Adjust `urllib.request.pathname2url()` so that forward slashes in Windows
paths are handled identically to backward slashes.
(cherry picked from commit bf224bd7ce)
Co-authored-by: Barney Gale <barney.gale@gmail.com>
GH-126212: Fix removal of slashes in file URIs on Windows (GH-126214)
Adjust `urllib.request.pathname2url()` and `url2pathname()` so that they
don't remove slashes from Windows DOS drive paths and URLs. There was no
basis for this behaviour, and it conflicts with how UNC and POSIX paths are
handled.
(cherry picked from commit 54c63a32d0)
Co-authored-by: Barney Gale <barney.gale@gmail.com>
GH-126205: Fix conversion of UNC paths to file URIs (GH-126208)
File URIs for Windows UNC paths should begin with two slashes, not four.
(cherry picked from commit 951cb2c369)
Co-authored-by: Barney Gale <barney.gale@gmail.com>
GH-125866: Improve tests for `pathname2url()` and `url2pathname()` (GH-125993)
Merge `URL2PathNameTests` and `PathName2URLTests` test cases (which test
only the Windows-specific implementations from `nturl2path`) into the main
`Pathname_Tests` test case for these functions.
Copy/port some test cases for `pathlib.Path.as_uri()` and `from_uri()`.
(cherry picked from commit 6742f14dfd)
Co-authored-by: Barney Gale <barney.gale@gmail.com>
`urllib.unquote_to_bytes` and `urllib.unquote` could both potentially generate `O(len(string))` intermediate `bytes` or `str` objects while computing the unquoted final result depending on the input provided. As Python objects are relatively large, this could consume a lot of ram.
This switches the implementation to using an expanding `bytearray` and a generator internally instead of precomputed `split()` style operations.
Microbenchmarks with some antagonistic inputs like `mess = "\u0141%%%20a%fe"*1000` show this is 10-20% slower for unquote and unquote_to_bytes and no different for typical inputs that are short or lack much unicode or % escaping. But the functions are already quite fast anyways so not a big deal. The slowdown scales consistently linear with input size as expected.
Memory usage observed manually using `/usr/bin/time -v` on `python -m timeit` runs of larger inputs. Unittesting memory consumption is difficult and does not seem worthwhile.
Observed memory usage is ~1/2 for `unquote()` and <1/3 for `unquote_to_bytes()` using `python -m timeit -s 'from urllib.parse import unquote, unquote_to_bytes; v="\u0141%01\u0161%20"*500_000' 'unquote_to_bytes(v)'` as a test.
- WASI's ``gethostname()`` is a stub that always fails with OSError
``ENOTSUP``
- skip mailcap ``test`` if subprocess is not available
- WASI process_time clock does not work.
Add methods enterContext() and enterClassContext() in TestCase.
Add method enterAsyncContext() in IsolatedAsyncioTestCase.
Add function enterModuleContext().
test_urllib commented since 2007:
commit d9880d07fc
Author: Facundo Batista <facundobatista@gmail.com>
Date: Fri May 25 04:20:22 2007 +0000
Commenting out the tests until find out who can test them in
one of the problematic enviroments.
pynche code commented since 1998 and 2001:
commit ef30092207
Author: Barry Warsaw <barry@python.org>
Date: Tue Dec 15 01:04:38 1998 +0000
Added most of the mechanism to change the strips from color variations
to color constants (i.e. red constant, green constant, blue
constant). But I haven't hooked this up yet because the UI gets more
crowded and the arrows don't reflect the correct values.
Added "Go to Black" and "Go to White" buttons.
commit 741eae0b31
Author: Barry Warsaw <barry@python.org>
Date: Wed Apr 18 03:51:55 2001 +0000
StripWidget.__init__(), update_yourself(): Removed some unused local
variables reported by PyChecker.
__togglegentype(): PyChecker accurately reported that the variable
__gentypevar was unused -- actually this whole method is currently
unused so comment it out.
urllib.request tests now call urlcleanup() to remove temporary files
created by urlretrieve() tests and to clear the _opener global
variable set by urlopen() and functions calling indirectly urlopen().
regrtest now checks if urllib.request._url_tempfiles and
urllib.request._opener are changed by tests.
CVE-2019-9948: Avoid file reading as disallowing the unnecessary URL
scheme in URLopener().open() and URLopener().retrieve()
of urllib.request.
Co-Authored-By: SH <push0ebp@gmail.com>
Disallow control chars in http URLs in urllib.urlopen. This addresses a potential security problem for applications that do not sanity check their URLs where http request headers could be injected.
* bpo-16285: Update urllib quoting to RFC 3986
urllib.parse.quote is now based on RFC 3986, and hence
includes `'~'` in the set of characters that is not escaped
by default.
Patch by Christian Theune and Ratnadeep Debnath.
In urllib.request, suffixes in no_proxy environment variable with
leading dots could match related hostnames again (e.g. .b.c matches a.b.c).
Patch by Milan Oberkirch.
The deprecation include manual creation of SSLSocket and certfile/keyfile
(or similar) in ftplib, httplib, imaplib, smtplib, poplib and urllib.
ssl.wrap_socket() is not marked as deprecated yet.
Ignore the HTTP_PROXY variable when REQUEST_METHOD environment is set, which
indicates that the script is in CGI mode.
Issue #27568 Reported and patch contributed by Rémi Rampin.
Ignore the HTTP_PROXY variable when REQUEST_METHOD environment is set, which
indicates that the script is in CGI mode.
Issue #27568 Reported and patch contributed by Rémi Rampin.
Ignore the HTTP_PROXY variable when REQUEST_METHOD environment is set, which
indicates that the script is in CGI mode.
Issue #27568 Reported and patch contributed by Rémi Rampin.
This changes the main documentation, doc strings, source code comments, and a
couple error messages in the test suite. In some cases the word was removed
or edited some other way to fix the grammar.