mirror of
https://github.com/python/cpython.git
synced 2025-08-03 08:34:29 +00:00
Issue #6042:
lnotab-based tracing is very complicated and isn't documented very well. There were at least 3 comment blocks purporting to document co_lnotab, and none did a very good job. This patch unifies them into Objects/lnotab_notes.txt which tries to completely capture the current state of affairs. I also discovered that we've attached 2 layers of patches to the basic tracing scheme. The first layer avoids jumping to instructions that don't start a line, to avoid problems in if statements and while loops. The second layer discovered that jumps backward do need to trace at instructions that don't start a line, so it added extra lnotab entries for 'while' and 'for' loops, and added a special case for backward jumps within the same line. I replaced these patches by just treating forward and backward jumps differently.
This commit is contained in:
parent
3724d6c392
commit
655d835415
6 changed files with 157 additions and 209 deletions
|
@ -798,9 +798,11 @@ always available.
|
|||
specifies the local trace function.
|
||||
|
||||
``'line'``
|
||||
The interpreter is about to execute a new line of code (sometimes multiple
|
||||
line events on one line exist). The local trace function is called; *arg*
|
||||
is ``None``; the return value specifies the new local trace function.
|
||||
The interpreter is about to execute a new line of code or re-execute the
|
||||
condition of a loop. The local trace function is called; *arg* is
|
||||
``None``; the return value specifies the new local trace function. See
|
||||
:file:`Objects/lnotab_notes.txt` for a detailed explanation of how this
|
||||
works.
|
||||
|
||||
``'return'``
|
||||
A function (or other code block) is about to return. The local trace
|
||||
|
|
|
@ -23,7 +23,8 @@ typedef struct {
|
|||
PyObject *co_filename; /* string (where it was loaded from) */
|
||||
PyObject *co_name; /* string (name, for reference) */
|
||||
int co_firstlineno; /* first source line number */
|
||||
PyObject *co_lnotab; /* string (encoding addr<->lineno mapping) */
|
||||
PyObject *co_lnotab; /* string (encoding addr<->lineno mapping) See
|
||||
Objects/lnotab_notes.txt for details. */
|
||||
void *co_zombieframe; /* for optimization only (see frameobject.c) */
|
||||
} PyCodeObject;
|
||||
|
||||
|
@ -90,15 +91,11 @@ typedef struct _addr_pair {
|
|||
int ap_upper;
|
||||
} PyAddrPair;
|
||||
|
||||
/* Check whether lasti (an instruction offset) falls outside bounds
|
||||
and whether it is a line number that should be traced. Returns
|
||||
a line number if it should be traced or -1 if the line should not.
|
||||
|
||||
If lasti is not within bounds, updates bounds.
|
||||
/* Update *bounds to describe the first and one-past-the-last instructions in the
|
||||
same line as lasti. Return the number of that line.
|
||||
*/
|
||||
|
||||
PyAPI_FUNC(int) PyCode_CheckLineNumber(PyCodeObject* co,
|
||||
int lasti, PyAddrPair *bounds);
|
||||
PyAPI_FUNC(int) _PyCode_CheckLineNumber(PyCodeObject* co,
|
||||
int lasti, PyAddrPair *bounds);
|
||||
|
||||
PyAPI_FUNC(PyObject*) PyCode_Optimize(PyObject *code, PyObject* consts,
|
||||
PyObject *names, PyObject *lineno_obj);
|
||||
|
|
|
@ -507,48 +507,8 @@ PyTypeObject PyCode_Type = {
|
|||
code_new, /* tp_new */
|
||||
};
|
||||
|
||||
/* All about c_lnotab.
|
||||
|
||||
c_lnotab is an array of unsigned bytes disguised as a Python string. In -O
|
||||
mode, SET_LINENO opcodes aren't generated, and bytecode offsets are mapped
|
||||
to source code line #s (when needed for tracebacks) via c_lnotab instead.
|
||||
The array is conceptually a list of
|
||||
(bytecode offset increment, line number increment)
|
||||
pairs. The details are important and delicate, best illustrated by example:
|
||||
|
||||
byte code offset source code line number
|
||||
0 1
|
||||
6 2
|
||||
50 7
|
||||
350 307
|
||||
361 308
|
||||
|
||||
The first trick is that these numbers aren't stored, only the increments
|
||||
from one row to the next (this doesn't really work, but it's a start):
|
||||
|
||||
0, 1, 6, 1, 44, 5, 300, 300, 11, 1
|
||||
|
||||
The second trick is that an unsigned byte can't hold negative values, or
|
||||
values larger than 255, so (a) there's a deep assumption that byte code
|
||||
offsets and their corresponding line #s both increase monotonically, and (b)
|
||||
if at least one column jumps by more than 255 from one row to the next, more
|
||||
than one pair is written to the table. In case #b, there's no way to know
|
||||
from looking at the table later how many were written. That's the delicate
|
||||
part. A user of c_lnotab desiring to find the source line number
|
||||
corresponding to a bytecode address A should do something like this
|
||||
|
||||
lineno = addr = 0
|
||||
for addr_incr, line_incr in c_lnotab:
|
||||
addr += addr_incr
|
||||
if addr > A:
|
||||
return lineno
|
||||
lineno += line_incr
|
||||
|
||||
In order for this to work, when the addr field increments by more than 255,
|
||||
the line # increment in each pair generated must be 0 until the remaining addr
|
||||
increment is < 256. So, in the example above, com_set_lineno should not (as
|
||||
was actually done until 2.2) expand 300, 300 to 255, 255, 45, 45, but to
|
||||
255, 0, 45, 255, 0, 45.
|
||||
/* Use co_lnotab to compute the line number from a bytecode index, addrq. See
|
||||
lnotab_notes.txt for the details of the lnotab representation.
|
||||
*/
|
||||
|
||||
int
|
||||
|
@ -567,85 +527,10 @@ PyCode_Addr2Line(PyCodeObject *co, int addrq)
|
|||
return line;
|
||||
}
|
||||
|
||||
/*
|
||||
Check whether the current instruction is at the start of a line.
|
||||
|
||||
*/
|
||||
|
||||
/* The theory of SET_LINENO-less tracing.
|
||||
|
||||
In a nutshell, we use the co_lnotab field of the code object
|
||||
to tell when execution has moved onto a different line.
|
||||
|
||||
As mentioned above, the basic idea is so set things up so
|
||||
that
|
||||
|
||||
*instr_lb <= frame->f_lasti < *instr_ub
|
||||
|
||||
is true so long as execution does not change lines.
|
||||
|
||||
This is all fairly simple. Digging the information out of
|
||||
co_lnotab takes some work, but is conceptually clear.
|
||||
|
||||
Somewhat harder to explain is why we don't *always* call the
|
||||
line trace function when the above test fails.
|
||||
|
||||
Consider this code:
|
||||
|
||||
1: def f(a):
|
||||
2: if a:
|
||||
3: print 1
|
||||
4: else:
|
||||
5: print 2
|
||||
|
||||
which compiles to this:
|
||||
|
||||
2 0 LOAD_FAST 0 (a)
|
||||
3 JUMP_IF_FALSE 9 (to 15)
|
||||
6 POP_TOP
|
||||
|
||||
3 7 LOAD_CONST 1 (1)
|
||||
10 PRINT_ITEM
|
||||
11 PRINT_NEWLINE
|
||||
12 JUMP_FORWARD 6 (to 21)
|
||||
>> 15 POP_TOP
|
||||
|
||||
5 16 LOAD_CONST 2 (2)
|
||||
19 PRINT_ITEM
|
||||
20 PRINT_NEWLINE
|
||||
>> 21 LOAD_CONST 0 (None)
|
||||
24 RETURN_VALUE
|
||||
|
||||
If 'a' is false, execution will jump to instruction at offset
|
||||
15 and the co_lnotab will claim that execution has moved to
|
||||
line 3. This is at best misleading. In this case we could
|
||||
associate the POP_TOP with line 4, but that doesn't make
|
||||
sense in all cases (I think).
|
||||
|
||||
What we do is only call the line trace function if the co_lnotab
|
||||
indicates we have jumped to the *start* of a line, i.e. if the
|
||||
current instruction offset matches the offset given for the
|
||||
start of a line by the co_lnotab.
|
||||
|
||||
This also takes care of the situation where 'a' is true.
|
||||
Execution will jump from instruction offset 12 to offset 21.
|
||||
Then the co_lnotab would imply that execution has moved to line
|
||||
5, which is again misleading.
|
||||
|
||||
Why do we set f_lineno when tracing? Well, consider the code
|
||||
above when 'a' is true. If stepping through this with 'n' in
|
||||
pdb, you would stop at line 1 with a "call" type event, then
|
||||
line events on lines 2 and 3, then a "return" type event -- but
|
||||
you would be shown line 5 during this event. This is a change
|
||||
from the behaviour in 2.2 and before, and I've found it
|
||||
confusing in practice. By setting and using f_lineno when
|
||||
tracing, one can report a line number different from that
|
||||
suggested by f_lasti on this one occasion where it's desirable.
|
||||
*/
|
||||
|
||||
|
||||
int
|
||||
PyCode_CheckLineNumber(PyCodeObject* co, int lasti, PyAddrPair *bounds)
|
||||
/* Update *bounds to describe the first and one-past-the-last instructions in
|
||||
the same line as lasti. Return the number of that line. */
|
||||
int
|
||||
_PyCode_CheckLineNumber(PyCodeObject* co, int lasti, PyAddrPair *bounds)
|
||||
{
|
||||
int size, addr, line;
|
||||
unsigned char* p;
|
||||
|
@ -662,11 +547,9 @@ PyCode_CheckLineNumber(PyCodeObject* co, int lasti, PyAddrPair *bounds)
|
|||
instr_lb -- if we stored the matching value of p
|
||||
somwhere we could skip the first while loop. */
|
||||
|
||||
/* see comments in compile.c for the description of
|
||||
/* See lnotab_notes.txt for the description of
|
||||
co_lnotab. A point to remember: increments to p
|
||||
should come in pairs -- although we don't care about
|
||||
the line increments here, treating them as byte
|
||||
increments gets confusing, to say the least. */
|
||||
come in (addr, line) pairs. */
|
||||
|
||||
bounds->ap_lower = 0;
|
||||
while (size > 0) {
|
||||
|
@ -679,13 +562,6 @@ PyCode_CheckLineNumber(PyCodeObject* co, int lasti, PyAddrPair *bounds)
|
|||
--size;
|
||||
}
|
||||
|
||||
/* If lasti and addr don't match exactly, we don't want to
|
||||
change the lineno slot on the frame or execute a trace
|
||||
function. Return -1 instead.
|
||||
*/
|
||||
if (addr != lasti)
|
||||
line = -1;
|
||||
|
||||
if (size > 0) {
|
||||
while (--size >= 0) {
|
||||
addr += *p++;
|
||||
|
|
124
Objects/lnotab_notes.txt
Normal file
124
Objects/lnotab_notes.txt
Normal file
|
@ -0,0 +1,124 @@
|
|||
All about co_lnotab, the line number table.
|
||||
|
||||
Code objects store a field named co_lnotab. This is an array of unsigned bytes
|
||||
disguised as a Python string. It is used to map bytecode offsets to source code
|
||||
line #s for tracebacks and to identify line number boundaries for line tracing.
|
||||
|
||||
The array is conceptually a compressed list of
|
||||
(bytecode offset increment, line number increment)
|
||||
pairs. The details are important and delicate, best illustrated by example:
|
||||
|
||||
byte code offset source code line number
|
||||
0 1
|
||||
6 2
|
||||
50 7
|
||||
350 307
|
||||
361 308
|
||||
|
||||
Instead of storing these numbers literally, we compress the list by storing only
|
||||
the increments from one row to the next. Conceptually, the stored list might
|
||||
look like:
|
||||
|
||||
0, 1, 6, 1, 44, 5, 300, 300, 11, 1
|
||||
|
||||
The above doesn't really work, but it's a start. Note that an unsigned byte
|
||||
can't hold negative values, or values larger than 255, and the above example
|
||||
contains two such values. So we make two tweaks:
|
||||
|
||||
(a) there's a deep assumption that byte code offsets and their corresponding
|
||||
line #s both increase monotonically, and
|
||||
(b) if at least one column jumps by more than 255 from one row to the next,
|
||||
more than one pair is written to the table. In case #b, there's no way to know
|
||||
from looking at the table later how many were written. That's the delicate
|
||||
part. A user of co_lnotab desiring to find the source line number
|
||||
corresponding to a bytecode address A should do something like this
|
||||
|
||||
lineno = addr = 0
|
||||
for addr_incr, line_incr in co_lnotab:
|
||||
addr += addr_incr
|
||||
if addr > A:
|
||||
return lineno
|
||||
lineno += line_incr
|
||||
|
||||
(In C, this is implemented by PyCode_Addr2Line().) In order for this to work,
|
||||
when the addr field increments by more than 255, the line # increment in each
|
||||
pair generated must be 0 until the remaining addr increment is < 256. So, in
|
||||
the example above, assemble_lnotab in compile.c should not (as was actually done
|
||||
until 2.2) expand 300, 300 to
|
||||
255, 255, 45, 45,
|
||||
but to
|
||||
255, 0, 45, 255, 0, 45.
|
||||
|
||||
The above is sufficient to reconstruct line numbers for tracebacks, but not for
|
||||
line tracing. Tracing is handled by PyCode_CheckLineNumber() in codeobject.c
|
||||
and maybe_call_line_trace() in ceval.c.
|
||||
|
||||
*** Tracing ***
|
||||
|
||||
To a first approximation, we want to call the tracing function when the line
|
||||
number of the current instruction changes. Re-computing the current line for
|
||||
every instruction is a little slow, though, so each time we compute the line
|
||||
number we save the bytecode indices where it's valid:
|
||||
|
||||
*instr_lb <= frame->f_lasti < *instr_ub
|
||||
|
||||
is true so long as execution does not change lines. That is, *instr_lb holds
|
||||
the first bytecode index of the current line, and *instr_ub holds the first
|
||||
bytecode index of the next line. As long as the above expression is true,
|
||||
maybe_call_line_trace() does not need to call PyCode_CheckLineNumber(). Note
|
||||
that the same line may appear multiple times in the lnotab, either because the
|
||||
bytecode jumped more than 255 indices between line number changes or because
|
||||
the compiler inserted the same line twice. Even in that case, *instr_ub holds
|
||||
the first index of the next line.
|
||||
|
||||
However, we don't *always* want to call the line trace function when the above
|
||||
test fails.
|
||||
|
||||
Consider this code:
|
||||
|
||||
1: def f(a):
|
||||
2: while a:
|
||||
3: print 1,
|
||||
4: break
|
||||
5: else:
|
||||
6: print 2,
|
||||
|
||||
which compiles to this:
|
||||
|
||||
2 0 SETUP_LOOP 19 (to 22)
|
||||
>> 3 LOAD_FAST 0 (a)
|
||||
6 POP_JUMP_IF_FALSE 17
|
||||
|
||||
3 9 LOAD_CONST 1 (1)
|
||||
12 PRINT_ITEM
|
||||
|
||||
4 13 BREAK_LOOP
|
||||
14 JUMP_ABSOLUTE 3
|
||||
>> 17 POP_BLOCK
|
||||
|
||||
6 18 LOAD_CONST 2 (2)
|
||||
21 PRINT_ITEM
|
||||
>> 22 LOAD_CONST 0 (None)
|
||||
25 RETURN_VALUE
|
||||
|
||||
If 'a' is false, execution will jump to the POP_BLOCK instruction at offset 17
|
||||
and the co_lnotab will claim that execution has moved to line 4, which is wrong.
|
||||
In this case, we could instead associate the POP_BLOCK with line 5, but that
|
||||
would break jumps around loops without else clauses.
|
||||
|
||||
We fix this by only calling the line trace function for a forward jump if the
|
||||
co_lnotab indicates we have jumped to the *start* of a line, i.e. if the current
|
||||
instruction offset matches the offset given for the start of a line by the
|
||||
co_lnotab. For backward jumps, however, we always call the line trace function,
|
||||
which lets a debugger stop on every evaluation of a loop guard (which usually
|
||||
won't be the first opcode in a line).
|
||||
|
||||
Why do we set f_lineno when tracing, and only just before calling the trace
|
||||
function? Well, consider the code above when 'a' is true. If stepping through
|
||||
this with 'n' in pdb, you would stop at line 1 with a "call" type event, then
|
||||
line events on lines 2, 3, and 4, then a "return" type event -- but because the
|
||||
code for the return actually falls in the range of the "line 6" opcodes, you
|
||||
would be shown line 6 during this event. This is a change from the behaviour in
|
||||
2.2 and before, and I've found it confusing in practice. By setting and using
|
||||
f_lineno when tracing, one can report a line number different from that
|
||||
suggested by f_lasti on this one occasion where it's desirable.
|
|
@ -3591,33 +3591,30 @@ _PyEval_CallTracing(PyObject *func, PyObject *args)
|
|||
return result;
|
||||
}
|
||||
|
||||
/* See Objects/lnotab_notes.txt for a description of how tracing works. */
|
||||
static int
|
||||
maybe_call_line_trace(Py_tracefunc func, PyObject *obj,
|
||||
PyFrameObject *frame, int *instr_lb, int *instr_ub,
|
||||
int *instr_prev)
|
||||
{
|
||||
int result = 0;
|
||||
int line = frame->f_lineno;
|
||||
|
||||
/* If the last instruction executed isn't in the current
|
||||
instruction window, reset the window. If the last
|
||||
instruction happens to fall at the start of a line or if it
|
||||
represents a jump backwards, call the trace function.
|
||||
instruction window, reset the window.
|
||||
*/
|
||||
if ((frame->f_lasti < *instr_lb || frame->f_lasti >= *instr_ub)) {
|
||||
int line;
|
||||
if (frame->f_lasti < *instr_lb || frame->f_lasti >= *instr_ub) {
|
||||
PyAddrPair bounds;
|
||||
|
||||
line = PyCode_CheckLineNumber(frame->f_code, frame->f_lasti,
|
||||
&bounds);
|
||||
if (line >= 0) {
|
||||
frame->f_lineno = line;
|
||||
result = call_trace(func, obj, frame,
|
||||
PyTrace_LINE, Py_None);
|
||||
}
|
||||
line = _PyCode_CheckLineNumber(frame->f_code, frame->f_lasti,
|
||||
&bounds);
|
||||
*instr_lb = bounds.ap_lower;
|
||||
*instr_ub = bounds.ap_upper;
|
||||
}
|
||||
else if (frame->f_lasti <= *instr_prev) {
|
||||
/* If the last instruction falls at the start of a line or if
|
||||
it represents a jump backwards, update the frame's line
|
||||
number and call the trace function. */
|
||||
if (frame->f_lasti == *instr_lb || frame->f_lasti < *instr_prev) {
|
||||
frame->f_lineno = line;
|
||||
result = call_trace(func, obj, frame, PyTrace_LINE, Py_None);
|
||||
}
|
||||
*instr_prev = frame->f_lasti;
|
||||
|
|
|
@ -1646,9 +1646,6 @@ compiler_for(struct compiler *c, stmt_ty s)
|
|||
VISIT(c, expr, s->v.For.iter);
|
||||
ADDOP(c, GET_ITER);
|
||||
compiler_use_next_block(c, start);
|
||||
/* for expressions must be traced on each iteration,
|
||||
so we need to set an extra line number. */
|
||||
c->u->u_lineno_set = false;
|
||||
ADDOP_JREL(c, FOR_ITER, cleanup);
|
||||
VISIT(c, expr, s->v.For.target);
|
||||
VISIT_SEQ(c, stmt, s->v.For.body);
|
||||
|
@ -1694,9 +1691,6 @@ compiler_while(struct compiler *c, stmt_ty s)
|
|||
if (!compiler_push_fblock(c, LOOP, loop))
|
||||
return 0;
|
||||
if (constant == -1) {
|
||||
/* while expressions must be traced on each iteration,
|
||||
so we need to set an extra line number. */
|
||||
c->u->u_lineno_set = false;
|
||||
VISIT(c, expr, s->v.While.test);
|
||||
ADDOP_JABS(c, POP_JUMP_IF_FALSE, anchor);
|
||||
}
|
||||
|
@ -3493,51 +3487,9 @@ blocksize(basicblock *b)
|
|||
return size;
|
||||
}
|
||||
|
||||
/* All about a_lnotab.
|
||||
|
||||
c_lnotab is an array of unsigned bytes disguised as a Python string.
|
||||
It is used to map bytecode offsets to source code line #s (when needed
|
||||
for tracebacks).
|
||||
|
||||
The array is conceptually a list of
|
||||
(bytecode offset increment, line number increment)
|
||||
pairs. The details are important and delicate, best illustrated by example:
|
||||
|
||||
byte code offset source code line number
|
||||
0 1
|
||||
6 2
|
||||
50 7
|
||||
350 307
|
||||
361 308
|
||||
|
||||
The first trick is that these numbers aren't stored, only the increments
|
||||
from one row to the next (this doesn't really work, but it's a start):
|
||||
|
||||
0, 1, 6, 1, 44, 5, 300, 300, 11, 1
|
||||
|
||||
The second trick is that an unsigned byte can't hold negative values, or
|
||||
values larger than 255, so (a) there's a deep assumption that byte code
|
||||
offsets and their corresponding line #s both increase monotonically, and (b)
|
||||
if at least one column jumps by more than 255 from one row to the next, more
|
||||
than one pair is written to the table. In case #b, there's no way to know
|
||||
from looking at the table later how many were written. That's the delicate
|
||||
part. A user of c_lnotab desiring to find the source line number
|
||||
corresponding to a bytecode address A should do something like this
|
||||
|
||||
lineno = addr = 0
|
||||
for addr_incr, line_incr in c_lnotab:
|
||||
addr += addr_incr
|
||||
if addr > A:
|
||||
return lineno
|
||||
lineno += line_incr
|
||||
|
||||
In order for this to work, when the addr field increments by more than 255,
|
||||
the line # increment in each pair generated must be 0 until the remaining addr
|
||||
increment is < 256. So, in the example above, assemble_lnotab (it used
|
||||
to be called com_set_lineno) should not (as was actually done until 2.2)
|
||||
expand 300, 300 to 255, 255, 45, 45,
|
||||
but to 255, 0, 45, 255, 0, 45.
|
||||
*/
|
||||
/* Appends a pair to the end of the line number table, a_lnotab, representing
|
||||
the instruction's bytecode offset and line number. See
|
||||
Objects/lnotab_notes.txt for the description of the line number table. */
|
||||
|
||||
static int
|
||||
assemble_lnotab(struct assembler *a, struct instr *i)
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue