cpython/Lib/tokenize.py
Benjamin Peterson 8f6713f46d Merged revisions 76235 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/py3k

................
  r76235 | benjamin.peterson | 2009-11-12 20:25:08 -0600 (Thu, 12 Nov 2009) | 170 lines

  Merged revisions 75149,75260-75263,75265-75267,75292,75300,75376,75405,75429-75433,75437,75445,75501,75551,75572,75589-75591,75657,75742,75868,75952-75957,76057,76105,76139,76143,76162,76223 via svnmerge from
  svn+ssh://pythondev@svn.python.org/python/trunk

  ........
    r75149 | gregory.p.smith | 2009-09-29 16:56:31 -0500 (Tue, 29 Sep 2009) | 3 lines

    Mention issue6972 in extractall docs about overwriting things outside of
    the supplied path.
  ........
    r75260 | andrew.kuchling | 2009-10-05 16:24:20 -0500 (Mon, 05 Oct 2009) | 1 line

    Wording fix
  ........
    r75261 | andrew.kuchling | 2009-10-05 16:24:35 -0500 (Mon, 05 Oct 2009) | 1 line

    Fix narkup
  ........
    r75262 | andrew.kuchling | 2009-10-05 16:25:03 -0500 (Mon, 05 Oct 2009) | 1 line

    Document 'skip' parameter to constructor
  ........
    r75263 | andrew.kuchling | 2009-10-05 16:25:35 -0500 (Mon, 05 Oct 2009) | 1 line

    Note side benefit of socket.create_connection()
  ........
    r75265 | andrew.kuchling | 2009-10-05 17:31:11 -0500 (Mon, 05 Oct 2009) | 1 line

    Reword sentence
  ........
    r75266 | andrew.kuchling | 2009-10-05 17:32:48 -0500 (Mon, 05 Oct 2009) | 1 line

    Use standard comma punctuation; reword some sentences in the docs
  ........
    r75267 | andrew.kuchling | 2009-10-05 17:42:56 -0500 (Mon, 05 Oct 2009) | 1 line

    Backport r73983: Document the thousands separator.
  ........
    r75292 | benjamin.peterson | 2009-10-08 22:11:36 -0500 (Thu, 08 Oct 2009) | 1 line

    death to old CVS keyword
  ........
    r75300 | benjamin.peterson | 2009-10-09 16:48:14 -0500 (Fri, 09 Oct 2009) | 1 line

    fix some coding style
  ........
    r75376 | benjamin.peterson | 2009-10-11 20:26:07 -0500 (Sun, 11 Oct 2009) | 1 line

    platform we don't care about
  ........
    r75405 | neil.schemenauer | 2009-10-14 12:17:14 -0500 (Wed, 14 Oct 2009) | 4 lines

    Issue #1754094: Improve the stack depth calculation in the compiler.
    There should be no other effect than a small decrease in memory use.
    Patch by Christopher Tur Lesniewski-Laas.
  ........
    r75429 | benjamin.peterson | 2009-10-14 20:47:28 -0500 (Wed, 14 Oct 2009) | 1 line

    pep8ify if blocks
  ........
    r75430 | benjamin.peterson | 2009-10-14 20:49:37 -0500 (Wed, 14 Oct 2009) | 1 line

    use floor division and add a test that exercises the tabsize codepath
  ........
    r75431 | benjamin.peterson | 2009-10-14 20:56:25 -0500 (Wed, 14 Oct 2009) | 1 line

    change test to what I intended
  ........
    r75432 | benjamin.peterson | 2009-10-14 22:05:39 -0500 (Wed, 14 Oct 2009) | 1 line

    some cleanups
  ........
    r75433 | benjamin.peterson | 2009-10-14 22:06:55 -0500 (Wed, 14 Oct 2009) | 1 line

    make inspect.isabstract() always return a boolean; add a test for it, too #7069
  ........
    r75437 | benjamin.peterson | 2009-10-15 10:44:46 -0500 (Thu, 15 Oct 2009) | 1 line

    only clear a module's __dict__ if the module is the only one with a reference to it #7140
  ........
    r75445 | vinay.sajip | 2009-10-16 09:06:44 -0500 (Fri, 16 Oct 2009) | 1 line

    Issue #7120: logging: Removed import of multiprocessing which is causing crash in GAE.
  ........
    r75501 | antoine.pitrou | 2009-10-18 13:37:11 -0500 (Sun, 18 Oct 2009) | 3 lines

    Add a comment about unreachable code, and fix a typo
  ........
    r75551 | benjamin.peterson | 2009-10-19 22:14:10 -0500 (Mon, 19 Oct 2009) | 1 line

    use property api
  ........
    r75572 | benjamin.peterson | 2009-10-20 16:55:17 -0500 (Tue, 20 Oct 2009) | 1 line

    clarify buffer arg #7178
  ........
    r75589 | benjamin.peterson | 2009-10-21 21:26:47 -0500 (Wed, 21 Oct 2009) | 1 line

    whitespace
  ........
    r75590 | benjamin.peterson | 2009-10-21 21:36:47 -0500 (Wed, 21 Oct 2009) | 1 line

    rewrite to be nice to other implementations
  ........
    r75591 | benjamin.peterson | 2009-10-21 21:50:38 -0500 (Wed, 21 Oct 2009) | 4 lines

    rewrite for style, clarify, and comments

    Also, use the hasattr() like scheme of allowing BaseException exceptions through.
  ........
    r75657 | antoine.pitrou | 2009-10-24 07:41:27 -0500 (Sat, 24 Oct 2009) | 3 lines

    Fix compilation error in debug mode.
  ........
    r75742 | benjamin.peterson | 2009-10-26 17:51:16 -0500 (Mon, 26 Oct 2009) | 1 line

    use 'is' instead of id()
  ........
    r75868 | benjamin.peterson | 2009-10-27 15:59:18 -0500 (Tue, 27 Oct 2009) | 1 line

    test expect base classes
  ........
    r75952 | georg.brandl | 2009-10-29 15:38:32 -0500 (Thu, 29 Oct 2009) | 1 line

    Use the correct function name in docstring.
  ........
    r75953 | georg.brandl | 2009-10-29 15:39:50 -0500 (Thu, 29 Oct 2009) | 1 line

    Remove mention of the old -X command line switch.
  ........
    r75954 | georg.brandl | 2009-10-29 15:53:00 -0500 (Thu, 29 Oct 2009) | 1 line

    Use constants instead of magic integers for test result.  Do not re-run with --verbose3 for environment changing tests.
  ........
    r75955 | georg.brandl | 2009-10-29 15:54:03 -0500 (Thu, 29 Oct 2009) | 1 line

    Use a single style for all the docstrings in the math module.
  ........
    r75956 | georg.brandl | 2009-10-29 16:16:34 -0500 (Thu, 29 Oct 2009) | 1 line

    I do not think the "railroad" program mentioned is still available.
  ........
    r75957 | georg.brandl | 2009-10-29 16:44:56 -0500 (Thu, 29 Oct 2009) | 1 line

    Fix constant name.
  ........
    r76057 | benjamin.peterson | 2009-11-02 09:06:45 -0600 (Mon, 02 Nov 2009) | 1 line

    prevent a rather unlikely segfault
  ........
    r76105 | georg.brandl | 2009-11-04 01:38:12 -0600 (Wed, 04 Nov 2009) | 1 line

    #7259: show correct equivalent for operator.i* operations in docstring; fix minor issues in operator docs.
  ........
    r76139 | benjamin.peterson | 2009-11-06 19:04:38 -0600 (Fri, 06 Nov 2009) | 1 line

    spelling
  ........
    r76143 | georg.brandl | 2009-11-07 02:26:07 -0600 (Sat, 07 Nov 2009) | 1 line

    #7271: fix typo.
  ........
    r76162 | benjamin.peterson | 2009-11-08 22:10:53 -0600 (Sun, 08 Nov 2009) | 1 line

    discuss how to use -p
  ........
    r76223 | georg.brandl | 2009-11-12 02:29:46 -0600 (Thu, 12 Nov 2009) | 1 line

    Give the profile module a module directive.
  ........
................
2009-11-13 02:29:35 +00:00

542 lines
20 KiB
Python

"""Tokenization help for Python programs.
tokenize(readline) is a generator that breaks a stream of
bytes into Python tokens. It decodes the bytes according to
PEP-0263 for determining source file encoding.
It accepts a readline-like method which is called
repeatedly to get the next line of input (or b"" for EOF). It generates
5-tuples with these members:
the token type (see token.py)
the token (a string)
the starting (row, column) indices of the token (a 2-tuple of ints)
the ending (row, column) indices of the token (a 2-tuple of ints)
the original line (string)
It is designed to match the working of the Python tokenizer exactly, except
that it produces COMMENT tokens for comments and gives type OP for all
operators. Aditionally, all token lists start with an ENCODING token
which tells you which encoding was used to decode the bytes stream."""
__author__ = 'Ka-Ping Yee <ping@lfw.org>'
__credits__ = ('GvR, ESR, Tim Peters, Thomas Wouters, Fred Drake, '
'Skip Montanaro, Raymond Hettinger, Trent Nelson, '
'Michael Foord')
import re, string, sys
from token import *
from codecs import lookup, BOM_UTF8
cookie_re = re.compile("coding[:=]\s*([-\w.]+)")
import token
__all__ = [x for x in dir(token) if not x.startswith("_")]
__all__.extend(["COMMENT", "tokenize", "detect_encoding", "NL", "untokenize",
"ENCODING", "TokenInfo"])
del token
COMMENT = N_TOKENS
tok_name[COMMENT] = 'COMMENT'
NL = N_TOKENS + 1
tok_name[NL] = 'NL'
ENCODING = N_TOKENS + 2
tok_name[ENCODING] = 'ENCODING'
N_TOKENS += 3
class TokenInfo(tuple):
'TokenInfo(type, string, start, end, line)'
__slots__ = ()
_fields = ('type', 'string', 'start', 'end', 'line')
def __new__(cls, type, string, start, end, line):
return tuple.__new__(cls, (type, string, start, end, line))
@classmethod
def _make(cls, iterable, new=tuple.__new__, len=len):
'Make a new TokenInfo object from a sequence or iterable'
result = new(cls, iterable)
if len(result) != 5:
raise TypeError('Expected 5 arguments, got %d' % len(result))
return result
def __repr__(self):
return 'TokenInfo(type=%r, string=%r, start=%r, end=%r, line=%r)' % self
def _asdict(self):
'Return a new dict which maps field names to their values'
return dict(zip(self._fields, self))
def _replace(self, **kwds):
'Return a new TokenInfo object replacing specified fields with new values'
result = self._make(map(kwds.pop, ('type', 'string', 'start', 'end', 'line'), self))
if kwds:
raise ValueError('Got unexpected field names: %r' % kwds.keys())
return result
def __getnewargs__(self):
return tuple(self)
type = property(lambda t: t[0])
string = property(lambda t: t[1])
start = property(lambda t: t[2])
end = property(lambda t: t[3])
line = property(lambda t: t[4])
def group(*choices): return '(' + '|'.join(choices) + ')'
def any(*choices): return group(*choices) + '*'
def maybe(*choices): return group(*choices) + '?'
# Note: we use unicode matching for names ("\w") but ascii matching for
# number literals.
Whitespace = r'[ \f\t]*'
Comment = r'#[^\r\n]*'
Ignore = Whitespace + any(r'\\\r?\n' + Whitespace) + maybe(Comment)
Name = r'[a-zA-Z_]\w*'
Hexnumber = r'0[xX][0-9a-fA-F]+'
Binnumber = r'0[bB][01]+'
Octnumber = r'0[oO][0-7]+'
Decnumber = r'(?:0+|[1-9][0-9]*)'
Intnumber = group(Hexnumber, Binnumber, Octnumber, Decnumber)
Exponent = r'[eE][-+]?[0-9]+'
Pointfloat = group(r'[0-9]+\.[0-9]*', r'\.[0-9]+') + maybe(Exponent)
Expfloat = r'[0-9]+' + Exponent
Floatnumber = group(Pointfloat, Expfloat)
Imagnumber = group(r'[0-9]+[jJ]', Floatnumber + r'[jJ]')
Number = group(Imagnumber, Floatnumber, Intnumber)
# Tail end of ' string.
Single = r"[^'\\]*(?:\\.[^'\\]*)*'"
# Tail end of " string.
Double = r'[^"\\]*(?:\\.[^"\\]*)*"'
# Tail end of ''' string.
Single3 = r"[^'\\]*(?:(?:\\.|'(?!''))[^'\\]*)*'''"
# Tail end of """ string.
Double3 = r'[^"\\]*(?:(?:\\.|"(?!""))[^"\\]*)*"""'
Triple = group("[bB]?[rR]?'''", '[bB]?[rR]?"""')
# Single-line ' or " string.
String = group(r"[bB]?[rR]?'[^\n'\\]*(?:\\.[^\n'\\]*)*'",
r'[bB]?[rR]?"[^\n"\\]*(?:\\.[^\n"\\]*)*"')
# Because of leftmost-then-longest match semantics, be sure to put the
# longest operators first (e.g., if = came before ==, == would get
# recognized as two instances of =).
Operator = group(r"\*\*=?", r">>=?", r"<<=?", r"!=",
r"//=?", r"->",
r"[+\-*/%&|^=<>]=?",
r"~")
Bracket = '[][(){}]'
Special = group(r'\r?\n', r'\.\.\.', r'[:;.,@]')
Funny = group(Operator, Bracket, Special)
PlainToken = group(Number, Funny, String, Name)
Token = Ignore + PlainToken
# First (or only) line of ' or " string.
ContStr = group(r"[bB]?[rR]?'[^\n'\\]*(?:\\.[^\n'\\]*)*" +
group("'", r'\\\r?\n'),
r'[bB]?[rR]?"[^\n"\\]*(?:\\.[^\n"\\]*)*' +
group('"', r'\\\r?\n'))
PseudoExtras = group(r'\\\r?\n', Comment, Triple)
PseudoToken = Whitespace + group(PseudoExtras, Number, Funny, ContStr, Name)
tokenprog, pseudoprog, single3prog, double3prog = map(
re.compile, (Token, PseudoToken, Single3, Double3))
endprogs = {"'": re.compile(Single), '"': re.compile(Double),
"'''": single3prog, '"""': double3prog,
"r'''": single3prog, 'r"""': double3prog,
"b'''": single3prog, 'b"""': double3prog,
"br'''": single3prog, 'br"""': double3prog,
"R'''": single3prog, 'R"""': double3prog,
"B'''": single3prog, 'B"""': double3prog,
"bR'''": single3prog, 'bR"""': double3prog,
"Br'''": single3prog, 'Br"""': double3prog,
"BR'''": single3prog, 'BR"""': double3prog,
'r': None, 'R': None, 'b': None, 'B': None}
triple_quoted = {}
for t in ("'''", '"""',
"r'''", 'r"""', "R'''", 'R"""',
"b'''", 'b"""', "B'''", 'B"""',
"br'''", 'br"""', "Br'''", 'Br"""',
"bR'''", 'bR"""', "BR'''", 'BR"""'):
triple_quoted[t] = t
single_quoted = {}
for t in ("'", '"',
"r'", 'r"', "R'", 'R"',
"b'", 'b"', "B'", 'B"',
"br'", 'br"', "Br'", 'Br"',
"bR'", 'bR"', "BR'", 'BR"' ):
single_quoted[t] = t
tabsize = 8
class TokenError(Exception): pass
class StopTokenizing(Exception): pass
class Untokenizer:
def __init__(self):
self.tokens = []
self.prev_row = 1
self.prev_col = 0
self.encoding = None
def add_whitespace(self, start):
row, col = start
assert row <= self.prev_row
col_offset = col - self.prev_col
if col_offset:
self.tokens.append(" " * col_offset)
def untokenize(self, iterable):
for t in iterable:
if len(t) == 2:
self.compat(t, iterable)
break
tok_type, token, start, end, line = t
if tok_type == ENCODING:
self.encoding = token
continue
self.add_whitespace(start)
self.tokens.append(token)
self.prev_row, self.prev_col = end
if tok_type in (NEWLINE, NL):
self.prev_row += 1
self.prev_col = 0
return "".join(self.tokens)
def compat(self, token, iterable):
startline = False
indents = []
toks_append = self.tokens.append
toknum, tokval = token
if toknum in (NAME, NUMBER):
tokval += ' '
if toknum in (NEWLINE, NL):
startline = True
prevstring = False
for tok in iterable:
toknum, tokval = tok[:2]
if toknum == ENCODING:
self.encoding = tokval
continue
if toknum in (NAME, NUMBER):
tokval += ' '
# Insert a space between two consecutive strings
if toknum == STRING:
if prevstring:
tokval = ' ' + tokval
prevstring = True
else:
prevstring = False
if toknum == INDENT:
indents.append(tokval)
continue
elif toknum == DEDENT:
indents.pop()
continue
elif toknum in (NEWLINE, NL):
startline = True
elif startline and indents:
toks_append(indents[-1])
startline = False
toks_append(tokval)
def untokenize(iterable):
"""Transform tokens back into Python source code.
It returns a bytes object, encoded using the ENCODING
token, which is the first token sequence output by tokenize.
Each element returned by the iterable must be a token sequence
with at least two elements, a token number and token value. If
only two tokens are passed, the resulting output is poor.
Round-trip invariant for full input:
Untokenized source will match input source exactly
Round-trip invariant for limited intput:
# Output bytes will tokenize the back to the input
t1 = [tok[:2] for tok in tokenize(f.readline)]
newcode = untokenize(t1)
readline = BytesIO(newcode).readline
t2 = [tok[:2] for tok in tokenize(readline)]
assert t1 == t2
"""
ut = Untokenizer()
out = ut.untokenize(iterable)
if ut.encoding is not None:
out = out.encode(ut.encoding)
return out
def _get_normal_name(orig_enc):
"""Imitates get_normal_name in tokenizer.c."""
# Only care about the first 12 characters.
enc = orig_enc[:12].lower().replace("_", "-")
if enc == "utf-8" or enc.startswith("utf-8-"):
return "utf-8"
if enc in ("latin-1", "iso-8859-1", "iso-latin-1") or \
enc.startswith(("latin-1-", "iso-8859-1-", "iso-latin-1-")):
return "iso-8859-1"
return orig_enc
def detect_encoding(readline):
"""
The detect_encoding() function is used to detect the encoding that should
be used to decode a Python source file. It requires one argment, readline,
in the same way as the tokenize() generator.
It will call readline a maximum of twice, and return the encoding used
(as a string) and a list of any lines (left as bytes) it has read
in.
It detects the encoding from the presence of a utf-8 bom or an encoding
cookie as specified in pep-0263. If both a bom and a cookie are present,
but disagree, a SyntaxError will be raised. If the encoding cookie is an
invalid charset, raise a SyntaxError.
If no encoding is specified, then the default of 'utf-8' will be returned.
"""
bom_found = False
encoding = None
def read_or_stop():
try:
return readline()
except StopIteration:
return b''
def find_cookie(line):
try:
line_string = line.decode('ascii')
except UnicodeDecodeError:
return None
matches = cookie_re.findall(line_string)
if not matches:
return None
encoding = _get_normal_name(matches[0])
try:
codec = lookup(encoding)
except LookupError:
# This behaviour mimics the Python interpreter
raise SyntaxError("unknown encoding: " + encoding)
if bom_found and codec.name != 'utf-8':
# This behaviour mimics the Python interpreter
raise SyntaxError('encoding problem: utf-8')
return encoding
first = read_or_stop()
if first.startswith(BOM_UTF8):
bom_found = True
first = first[3:]
if not first:
return 'utf-8', []
encoding = find_cookie(first)
if encoding:
return encoding, [first]
second = read_or_stop()
if not second:
return 'utf-8', [first]
encoding = find_cookie(second)
if encoding:
return encoding, [first, second]
return 'utf-8', [first, second]
def tokenize(readline):
"""
The tokenize() generator requires one argment, readline, which
must be a callable object which provides the same interface as the
readline() method of built-in file objects. Each call to the function
should return one line of input as bytes. Alternately, readline
can be a callable function terminating with StopIteration:
readline = open(myfile, 'rb').__next__ # Example of alternate readline
The generator produces 5-tuples with these members: the token type; the
token string; a 2-tuple (srow, scol) of ints specifying the row and
column where the token begins in the source; a 2-tuple (erow, ecol) of
ints specifying the row and column where the token ends in the source;
and the line on which the token was found. The line passed is the
logical line; continuation lines are included.
The first token sequence will always be an ENCODING token
which tells you which encoding was used to decode the bytes stream.
"""
encoding, consumed = detect_encoding(readline)
def readline_generator(consumed):
for line in consumed:
yield line
while True:
try:
yield readline()
except StopIteration:
return
chained = readline_generator(consumed)
return _tokenize(chained.__next__, encoding)
def _tokenize(readline, encoding):
lnum = parenlev = continued = 0
namechars, numchars = string.ascii_letters + '_', '0123456789'
contstr, needcont = '', 0
contline = None
indents = [0]
if encoding is not None:
yield TokenInfo(ENCODING, encoding, (0, 0), (0, 0), '')
while True: # loop over lines in stream
try:
line = readline()
except StopIteration:
line = b''
if encoding is not None:
line = line.decode(encoding)
lnum += 1
pos, max = 0, len(line)
if contstr: # continued string
if not line:
raise TokenError("EOF in multi-line string", strstart)
endmatch = endprog.match(line)
if endmatch:
pos = end = endmatch.end(0)
yield TokenInfo(STRING, contstr + line[:end],
strstart, (lnum, end), contline + line)
contstr, needcont = '', 0
contline = None
elif needcont and line[-2:] != '\\\n' and line[-3:] != '\\\r\n':
yield TokenInfo(ERRORTOKEN, contstr + line,
strstart, (lnum, len(line)), contline)
contstr = ''
contline = None
continue
else:
contstr = contstr + line
contline = contline + line
continue
elif parenlev == 0 and not continued: # new statement
if not line: break
column = 0
while pos < max: # measure leading whitespace
if line[pos] == ' ':
column += 1
elif line[pos] == '\t':
column = (column//tabsize + 1)*tabsize
elif line[pos] == '\f':
column = 0
else:
break
pos += 1
if pos == max:
break
if line[pos] in '#\r\n': # skip comments or blank lines
if line[pos] == '#':
comment_token = line[pos:].rstrip('\r\n')
nl_pos = pos + len(comment_token)
yield TokenInfo(COMMENT, comment_token,
(lnum, pos), (lnum, pos + len(comment_token)), line)
yield TokenInfo(NL, line[nl_pos:],
(lnum, nl_pos), (lnum, len(line)), line)
else:
yield TokenInfo((NL, COMMENT)[line[pos] == '#'], line[pos:],
(lnum, pos), (lnum, len(line)), line)
continue
if column > indents[-1]: # count indents or dedents
indents.append(column)
yield TokenInfo(INDENT, line[:pos], (lnum, 0), (lnum, pos), line)
while column < indents[-1]:
if column not in indents:
raise IndentationError(
"unindent does not match any outer indentation level",
("<tokenize>", lnum, pos, line))
indents = indents[:-1]
yield TokenInfo(DEDENT, '', (lnum, pos), (lnum, pos), line)
else: # continued statement
if not line:
raise TokenError("EOF in multi-line statement", (lnum, 0))
continued = 0
while pos < max:
pseudomatch = pseudoprog.match(line, pos)
if pseudomatch: # scan for tokens
start, end = pseudomatch.span(1)
spos, epos, pos = (lnum, start), (lnum, end), end
token, initial = line[start:end], line[start]
if (initial in numchars or # ordinary number
(initial == '.' and token != '.' and token != '...')):
yield TokenInfo(NUMBER, token, spos, epos, line)
elif initial in '\r\n':
yield TokenInfo(NL if parenlev > 0 else NEWLINE,
token, spos, epos, line)
elif initial == '#':
assert not token.endswith("\n")
yield TokenInfo(COMMENT, token, spos, epos, line)
elif token in triple_quoted:
endprog = endprogs[token]
endmatch = endprog.match(line, pos)
if endmatch: # all on one line
pos = endmatch.end(0)
token = line[start:pos]
yield TokenInfo(STRING, token, spos, (lnum, pos), line)
else:
strstart = (lnum, start) # multiple lines
contstr = line[start:]
contline = line
break
elif initial in single_quoted or \
token[:2] in single_quoted or \
token[:3] in single_quoted:
if token[-1] == '\n': # continued string
strstart = (lnum, start)
endprog = (endprogs[initial] or endprogs[token[1]] or
endprogs[token[2]])
contstr, needcont = line[start:], 1
contline = line
break
else: # ordinary string
yield TokenInfo(STRING, token, spos, epos, line)
elif initial in namechars: # ordinary name
yield TokenInfo(NAME, token, spos, epos, line)
elif initial == '\\': # continued stmt
continued = 1
else:
if initial in '([{':
parenlev += 1
elif initial in ')]}':
parenlev -= 1
yield TokenInfo(OP, token, spos, epos, line)
else:
yield TokenInfo(ERRORTOKEN, line[pos],
(lnum, pos), (lnum, pos+1), line)
pos += 1
for indent in indents[1:]: # pop remaining indent levels
yield TokenInfo(DEDENT, '', (lnum, 0), (lnum, 0), '')
yield TokenInfo(ENDMARKER, '', (lnum, 0), (lnum, 0), '')
# An undocumented, backwards compatible, API for all the places in the standard
# library that expect to be able to use tokenize with strings
def generate_tokens(readline):
return _tokenize(readline, None)