urllib: Simplify splithost by calling into urlparse. (#1849)

The current regex based splitting produces a wrong result. For example::

  http://abc#@def

Web browsers parse that URL as ``http://abc/#@def``, that is, the host
is ``abc``, the path is ``/``, and the fragment is ``#@def``.
This commit is contained in:
postmasters 2017-06-20 06:02:44 -07:00 committed by Victor Stinner
parent 5cc7ac24da
commit 90e01e50ef
4 changed files with 47 additions and 14 deletions

View file

@ -947,7 +947,7 @@ def splithost(url):
"""splithost('//host[:port]/path') --> 'host[:port]', '/path'."""
global _hostprog
if _hostprog is None:
_hostprog = re.compile('//([^/?]*)(.*)', re.DOTALL)
_hostprog = re.compile('//([^/#?]*)(.*)', re.DOTALL)
match = _hostprog.match(url)
if match: