bpo-31672: Fix string.Template accidentally matched non-ASCII identifiers (GH-3872)

Pattern `[a-z]` with `IGNORECASE` flag can match to some non-ASCII characters.

Straightforward solution for this is using `IGNORECASE | ASCII` flag.
But users may subclass `Template` and override only `idpattern`. So we want to
avoid changing `Template.flags`.

So this commit uses local flag `-i` for `idpattern` and change `[a-z]` to `[a-zA-Z]`.
(cherry picked from commit b22273ec5d)
This commit is contained in:
INADA Naoki 2017-10-14 14:21:59 +09:00 committed by GitHub
parent 6234e90683
commit 7060380d57
4 changed files with 25 additions and 3 deletions

View file

@ -746,8 +746,18 @@ to parse template strings. To do this, you can override these class attributes:
* *idpattern* -- This is the regular expression describing the pattern for * *idpattern* -- This is the regular expression describing the pattern for
non-braced placeholders (the braces will be added automatically as non-braced placeholders (the braces will be added automatically as
appropriate). The default value is the regular expression appropriate). The default value is the regular expression
``[_a-z][_a-z0-9]*``. ``(?-i:[_a-zA-Z][_a-zA-Z0-9]*)``.
.. note::
Since default *flags* is ``re.IGNORECASE``, pattern ``[a-z]`` can match
with some non-ASCII characters. That's why we use local ``-i`` flag here.
While *flags* is kept to ``re.IGNORECASE`` for backward compatibility,
you can override it to ``0`` or ``re.IGNORECASE | re.ASCII`` when
subclassing.
* *flags* -- The regular expression flags that will be applied when compiling * *flags* -- The regular expression flags that will be applied when compiling
the regular expression used for recognizing substitutions. The default value the regular expression used for recognizing substitutions. The default value

View file

@ -78,7 +78,11 @@ class Template(metaclass=_TemplateMetaclass):
"""A string class for supporting $-substitutions.""" """A string class for supporting $-substitutions."""
delimiter = '$' delimiter = '$'
idpattern = r'[_a-z][_a-z0-9]*' # r'[a-z]' matches to non-ASCII letters when used with IGNORECASE,
# but without ASCII flag. We can't add re.ASCII to flags because of
# backward compatibility. So we use local -i flag and [a-zA-Z] pattern.
# See https://bugs.python.org/issue31672
idpattern = r'(?-i:[_a-zA-Z][_a-zA-Z0-9]*)'
flags = _re.IGNORECASE flags = _re.IGNORECASE
def __init__(self, template): def __init__(self, template):

View file

@ -271,6 +271,12 @@ class TestTemplate(unittest.TestCase):
raises(ValueError, s.substitute, dict(who='tim')) raises(ValueError, s.substitute, dict(who='tim'))
s = Template('$who likes $100') s = Template('$who likes $100')
raises(ValueError, s.substitute, dict(who='tim')) raises(ValueError, s.substitute, dict(who='tim'))
# Template.idpattern should match to only ASCII characters.
# https://bugs.python.org/issue31672
s = Template("$who likes $\u0131") # (DOTLESS I)
raises(ValueError, s.substitute, dict(who='tim'))
s = Template("$who likes $\u0130") # (LATIN CAPITAL LETTER I WITH DOT ABOVE)
raises(ValueError, s.substitute, dict(who='tim'))
def test_idpattern_override(self): def test_idpattern_override(self):
class PathPattern(Template): class PathPattern(Template):

View file

@ -0,0 +1,2 @@
``idpattern`` in ``string.Template`` matched some non-ASCII characters. Now
it uses ``-i`` regular expression local flag to avoid non-ASCII characters.