SF bug #1582282; decode_header() incorrectly splits not-conformant RFC

2047-like headers where there is no whitespace between encoded words.  This
fix changes the matching regexp to include a trailing lookahead assertion that
the closing ?= must be followed by whitespace, newline, or end-of-string.
This also changes the regexp to add the MULTILINE flag.
This commit is contained in:
Barry Warsaw 2007-03-14 04:59:50 +00:00
parent 47c52a8b60
commit dcd24ae501
3 changed files with 26 additions and 1 deletions

View file

@ -39,7 +39,8 @@ ecre = re.compile(r'''
\? # literal ?
(?P<encoded>.*?) # non-greedy up to the next ?= is the encoded string
\?= # literal ?=
''', re.VERBOSE | re.IGNORECASE)
(?=[ \t]|$) # whitespace or the end of the string
''', re.VERBOSE | re.IGNORECASE | re.MULTILINE)
# Field name regexp, including trailing colon, but not separating whitespace,
# according to RFC 2822. Character range is from tilde to exclamation mark.