[3.13] gh-102555: Fix comment parsing in HTMLParser according to the HTML5 standard (GH-135664) (GH-136272)
Some checks are pending
Tests / (push) Blocked by required conditions
Tests / Windows MSI (push) Blocked by required conditions
Tests / Ubuntu SSL tests with OpenSSL (push) Blocked by required conditions
Tests / Change detection (push) Waiting to run
Tests / Docs (push) Blocked by required conditions
Tests / Check if the ABI has changed (push) Blocked by required conditions
Tests / Check if Autoconf files are up to date (push) Blocked by required conditions
Tests / Check if generated files are up to date (push) Blocked by required conditions
Tests / WASI (push) Blocked by required conditions
Tests / Hypothesis tests on Ubuntu (push) Blocked by required conditions
Tests / Address sanitizer (push) Blocked by required conditions
Tests / CIFuzz (push) Blocked by required conditions
Tests / All required checks pass (push) Blocked by required conditions
Lint / lint (push) Waiting to run

* "--!>" now ends the comment.
* "-- >" no longer ends the comment.
* Support abnormally ended empty comments "<-->" and "<--->".

---------
(cherry picked from commit 8ac7613dc8)

Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
This commit is contained in:
Miss Islington (bot) 2025-07-04 09:22:33 +02:00 committed by GitHub
parent b041a456f1
commit 7038e4a243
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 50 additions and 3 deletions

View file

@ -29,7 +29,8 @@ attr_charref = re.compile(r'&(#[0-9]+|#[xX][0-9a-fA-F]+|[a-zA-Z][a-zA-Z0-9]*)[;=
starttagopen = re.compile('<[a-zA-Z]')
endtagopen = re.compile('</[a-zA-Z]')
piclose = re.compile('>')
commentclose = re.compile(r'--\s*>')
commentclose = re.compile(r'--!?>')
commentabruptclose = re.compile(r'-?>')
# Note:
# 1) if you change tagfind/attrfind remember to update locatetagend too;
# 2) if you change tagfind/attrfind and/or locatetagend the parser will
@ -336,6 +337,21 @@ class HTMLParser(_markupbase.ParserBase):
else:
return self.parse_bogus_comment(i)
# Internal -- parse comment, return length or -1 if not terminated
# see https://html.spec.whatwg.org/multipage/parsing.html#comment-start-state
def parse_comment(self, i, report=True):
rawdata = self.rawdata
assert rawdata.startswith('<!--', i), 'unexpected call to parse_comment()'
match = commentclose.search(rawdata, i+4)
if not match:
match = commentabruptclose.match(rawdata, i+4)
if not match:
return -1
if report:
j = match.start()
self.handle_comment(rawdata[i+4: j])
return match.end()
# Internal -- parse bogus comment, return length or -1 if not terminated
# see https://html.spec.whatwg.org/multipage/parsing.html#bogus-comment-state
def parse_bogus_comment(self, i, report=1):