mirror of
https://github.com/python/cpython.git
synced 2025-07-07 19:35:27 +00:00
gh-119786: add code object doc, inline locations.md into it (#126832)
This commit is contained in:
parent
ca3ea9ad05
commit
4b12a6ff4a
6 changed files with 143 additions and 82 deletions
|
@ -24,9 +24,7 @@ Compiling Python Source Code
|
||||||
Runtime Objects
|
Runtime Objects
|
||||||
---
|
---
|
||||||
|
|
||||||
- [Code Objects (coming soon)](code_objects.md)
|
- [Code Objects](code_objects.md)
|
||||||
|
|
||||||
- [The Source Code Locations Table](locations.md)
|
|
||||||
|
|
||||||
- [Generators (coming soon)](generators.md)
|
- [Generators (coming soon)](generators.md)
|
||||||
|
|
||||||
|
|
|
@ -1,5 +1,139 @@
|
||||||
|
|
||||||
Code objects
|
# Code objects
|
||||||
============
|
|
||||||
|
|
||||||
Coming soon.
|
A `CodeObject` is a builtin Python type that represents a compiled executable,
|
||||||
|
such as a compiled function or class.
|
||||||
|
It contains a sequence of bytecode instructions along with its associated
|
||||||
|
metadata: data which is necessary to execute the bytecode instructions (such
|
||||||
|
as the values of the constants they access) or context information such as
|
||||||
|
the source code location, which is useful for debuggers and other tools.
|
||||||
|
|
||||||
|
Since 3.11, the final field of the `PyCodeObject` C struct is an array
|
||||||
|
of indeterminate length containing the bytecode, `code->co_code_adaptive`.
|
||||||
|
(In older versions the code object was a
|
||||||
|
[`bytes`](https://docs.python.org/dev/library/stdtypes.html#bytes)
|
||||||
|
object, `code->co_code`; this was changed to save an allocation and to
|
||||||
|
allow it to be mutated.)
|
||||||
|
|
||||||
|
Code objects are typically produced by the bytecode [compiler](compiler.md),
|
||||||
|
although they are often written to disk by one process and read back in by another.
|
||||||
|
The disk version of a code object is serialized using the
|
||||||
|
[marshal](https://docs.python.org/dev/library/marshal.html) protocol.
|
||||||
|
|
||||||
|
Code objects are nominally immutable.
|
||||||
|
Some fields (including `co_code_adaptive` and fields for runtime
|
||||||
|
information such as `_co_monitoring`) are mutable, but mutable fields are
|
||||||
|
not included when code objects are hashed or compared.
|
||||||
|
|
||||||
|
## Source code locations
|
||||||
|
|
||||||
|
Whenever an exception occurs, the interpreter adds a traceback entry to
|
||||||
|
the exception for the current frame, as well as each frame on the stack that
|
||||||
|
it unwinds.
|
||||||
|
The `tb_lineno` field of a traceback entry is (lazily) set to the line
|
||||||
|
number of the instruction that was executing in the frame at the time of
|
||||||
|
the exception.
|
||||||
|
This field is computed from the locations table, `co_linetable`, by the function
|
||||||
|
[`PyCode_Addr2Line`](https://docs.python.org/dev/c-api/code.html#c.PyCode_Addr2Line).
|
||||||
|
Despite its name, `co_linetable` includes more than line numbers; it represents
|
||||||
|
a 4-number source location for every instruction, indicating the precise line
|
||||||
|
and column at which it begins and ends. This is a significant amount of data,
|
||||||
|
so a compact format is very important.
|
||||||
|
|
||||||
|
Note that traceback objects don't store all this information -- they store the start line
|
||||||
|
number, for backward compatibility, and the "last instruction" value.
|
||||||
|
The rest can be computed from the last instruction (`tb_lasti`) with the help of the
|
||||||
|
locations table. For Python code, there is a convenience method
|
||||||
|
(`codeobject.co_positions`)[https://docs.python.org/dev/reference/datamodel.html#codeobject.co_positions]
|
||||||
|
which returns an iterator of `({line}, {endline}, {column}, {endcolumn})` tuples,
|
||||||
|
one per instruction.
|
||||||
|
There is also `co_lines()` which returns an iterator of `({start}, {end}, {line})` tuples,
|
||||||
|
where `{start}` and `{end}` are bytecode offsets.
|
||||||
|
The latter is described by [`PEP 626`](https://peps.python.org/pep-0626/); it is more
|
||||||
|
compact, but doesn't return end line numbers or column offsets.
|
||||||
|
From C code, you need to call
|
||||||
|
[`PyCode_Addr2Location`](https://docs.python.org/dev/c-api/code.html#c.PyCode_Addr2Location).
|
||||||
|
|
||||||
|
As the locations table is only consulted when displaying a traceback and when
|
||||||
|
tracing (to pass the line number to the tracing function), lookup is not
|
||||||
|
performance critical.
|
||||||
|
In order to reduce the overhead during tracing, the mapping from instruction offset to
|
||||||
|
line number is cached in the ``_co_linearray`` field.
|
||||||
|
|
||||||
|
### Format of the locations table
|
||||||
|
|
||||||
|
The `co_linetable` bytes object of code objects contains a compact
|
||||||
|
representation of the source code positions of instructions, which are
|
||||||
|
returned by the `co_positions()` iterator.
|
||||||
|
|
||||||
|
> [!NOTE]
|
||||||
|
> `co_linetable` is not to be confused with `co_lnotab`.
|
||||||
|
> For backwards compatibility, `co_lnotab` exposes the format
|
||||||
|
> as it existed in Python 3.10 and lower: this older format
|
||||||
|
> stores only the start line for each instruction.
|
||||||
|
> It is lazily created from `co_linetable` when accessed.
|
||||||
|
> See [`Objects/lnotab_notes.txt`](../Objects/lnotab_notes.txt) for more details.
|
||||||
|
|
||||||
|
`co_linetable` consists of a sequence of location entries.
|
||||||
|
Each entry starts with a byte with the most significant bit set, followed by zero or more bytes with the most significant bit unset.
|
||||||
|
|
||||||
|
Each entry contains the following information:
|
||||||
|
* The number of code units covered by this entry (length)
|
||||||
|
* The start line
|
||||||
|
* The end line
|
||||||
|
* The start column
|
||||||
|
* The end column
|
||||||
|
|
||||||
|
The first byte has the following format:
|
||||||
|
|
||||||
|
Bit 7 | Bits 3-6 | Bits 0-2
|
||||||
|
---- | ---- | ----
|
||||||
|
1 | Code | Length (in code units) - 1
|
||||||
|
|
||||||
|
The codes are enumerated in the `_PyCodeLocationInfoKind` enum.
|
||||||
|
|
||||||
|
## Variable-length integer encodings
|
||||||
|
|
||||||
|
Integers are often encoded using a variable-length integer encoding
|
||||||
|
|
||||||
|
### Unsigned integers (`varint`)
|
||||||
|
|
||||||
|
Unsigned integers are encoded in 6-bit chunks, least significant first.
|
||||||
|
Each chunk but the last has bit 6 set.
|
||||||
|
For example:
|
||||||
|
|
||||||
|
* 63 is encoded as `0x3f`
|
||||||
|
* 200 is encoded as `0x48`, `0x03`
|
||||||
|
|
||||||
|
### Signed integers (`svarint`)
|
||||||
|
|
||||||
|
Signed integers are encoded by converting them to unsigned integers, using the following function:
|
||||||
|
```Python
|
||||||
|
def convert(s):
|
||||||
|
if s < 0:
|
||||||
|
return ((-s)<<1) | 1
|
||||||
|
else:
|
||||||
|
return (s<<1)
|
||||||
|
```
|
||||||
|
|
||||||
|
*Location entries*
|
||||||
|
|
||||||
|
The meaning of the codes and the following bytes are as follows:
|
||||||
|
|
||||||
|
Code | Meaning | Start line | End line | Start column | End column
|
||||||
|
---- | ---- | ---- | ---- | ---- | ----
|
||||||
|
0-9 | Short form | Δ 0 | Δ 0 | See below | See below
|
||||||
|
10-12 | One line form | Δ (code - 10) | Δ 0 | unsigned byte | unsigned byte
|
||||||
|
13 | No column info | Δ svarint | Δ 0 | None | None
|
||||||
|
14 | Long form | Δ svarint | Δ varint | varint | varint
|
||||||
|
15 | No location | None | None | None | None
|
||||||
|
|
||||||
|
The Δ means the value is encoded as a delta from another value:
|
||||||
|
* Start line: Delta from the previous start line, or `co_firstlineno` for the first entry.
|
||||||
|
* End line: Delta from the start line
|
||||||
|
|
||||||
|
*The short forms*
|
||||||
|
|
||||||
|
Codes 0-9 are the short forms. The short form consists of two bytes, the second byte holding additional column information. The code is the start column divided by 8 (and rounded down).
|
||||||
|
* Start column: `(code*8) + ((second_byte>>4)&7)`
|
||||||
|
* End column: `start_column + (second_byte&15)`
|
||||||
|
|
|
@ -443,14 +443,12 @@ reference to the source code (filename, etc). All of this is implemented by
|
||||||
Code objects
|
Code objects
|
||||||
============
|
============
|
||||||
|
|
||||||
The result of `PyAST_CompileObject()` is a `PyCodeObject` which is defined in
|
The result of `_PyAST_Compile()` is a `PyCodeObject` which is defined in
|
||||||
[Include/cpython/code.h](../Include/cpython/code.h).
|
[Include/cpython/code.h](../Include/cpython/code.h).
|
||||||
And with that you now have executable Python bytecode!
|
And with that you now have executable Python bytecode!
|
||||||
|
|
||||||
The code objects (byte code) are executed in [Python/ceval.c](../Python/ceval.c).
|
The code objects (byte code) are executed in `_PyEval_EvalFrameDefault()`
|
||||||
This file will also need a new case statement for the new opcode in the big switch
|
in [Python/ceval.c](../Python/ceval.c).
|
||||||
statement in `_PyEval_EvalFrameDefault()`.
|
|
||||||
|
|
||||||
|
|
||||||
Important files
|
Important files
|
||||||
===============
|
===============
|
||||||
|
|
|
@ -16,7 +16,7 @@ from the instruction definitions in [Python/bytecodes.c](../Python/bytecodes.c)
|
||||||
which are written in [a DSL](../Tools/cases_generator/interpreter_definition.md)
|
which are written in [a DSL](../Tools/cases_generator/interpreter_definition.md)
|
||||||
developed for this purpose.
|
developed for this purpose.
|
||||||
|
|
||||||
Recall that the [Python Compiler](compiler.md) produces a [`CodeObject`](code_object.md),
|
Recall that the [Python Compiler](compiler.md) produces a [`CodeObject`](code_objects.md),
|
||||||
which contains the bytecode instructions along with static data that is required to execute them,
|
which contains the bytecode instructions along with static data that is required to execute them,
|
||||||
such as the consts list, variable names,
|
such as the consts list, variable names,
|
||||||
[exception table](exception_handling.md#format-of-the-exception-table), and so on.
|
[exception table](exception_handling.md#format-of-the-exception-table), and so on.
|
||||||
|
|
|
@ -1,69 +0,0 @@
|
||||||
# Locations table
|
|
||||||
|
|
||||||
The `co_linetable` bytes object of code objects contains a compact
|
|
||||||
representation of the source code positions of instructions, which are
|
|
||||||
returned by the `co_positions()` iterator.
|
|
||||||
|
|
||||||
`co_linetable` consists of a sequence of location entries.
|
|
||||||
Each entry starts with a byte with the most significant bit set, followed by zero or more bytes with most significant bit unset.
|
|
||||||
|
|
||||||
Each entry contains the following information:
|
|
||||||
* The number of code units covered by this entry (length)
|
|
||||||
* The start line
|
|
||||||
* The end line
|
|
||||||
* The start column
|
|
||||||
* The end column
|
|
||||||
|
|
||||||
The first byte has the following format:
|
|
||||||
|
|
||||||
Bit 7 | Bits 3-6 | Bits 0-2
|
|
||||||
---- | ---- | ----
|
|
||||||
1 | Code | Length (in code units) - 1
|
|
||||||
|
|
||||||
The codes are enumerated in the `_PyCodeLocationInfoKind` enum.
|
|
||||||
|
|
||||||
## Variable length integer encodings
|
|
||||||
|
|
||||||
Integers are often encoded using a variable length integer encoding
|
|
||||||
|
|
||||||
### Unsigned integers (varint)
|
|
||||||
|
|
||||||
Unsigned integers are encoded in 6 bit chunks, least significant first.
|
|
||||||
Each chunk but the last has bit 6 set.
|
|
||||||
For example:
|
|
||||||
|
|
||||||
* 63 is encoded as `0x3f`
|
|
||||||
* 200 is encoded as `0x48`, `0x03`
|
|
||||||
|
|
||||||
### Signed integers (svarint)
|
|
||||||
|
|
||||||
Signed integers are encoded by converting them to unsigned integers, using the following function:
|
|
||||||
```Python
|
|
||||||
def convert(s):
|
|
||||||
if s < 0:
|
|
||||||
return ((-s)<<1) | 1
|
|
||||||
else:
|
|
||||||
return (s<<1)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Location entries
|
|
||||||
|
|
||||||
The meaning of the codes and the following bytes are as follows:
|
|
||||||
|
|
||||||
Code | Meaning | Start line | End line | Start column | End column
|
|
||||||
---- | ---- | ---- | ---- | ---- | ----
|
|
||||||
0-9 | Short form | Δ 0 | Δ 0 | See below | See below
|
|
||||||
10-12 | One line form | Δ (code - 10) | Δ 0 | unsigned byte | unsigned byte
|
|
||||||
13 | No column info | Δ svarint | Δ 0 | None | None
|
|
||||||
14 | Long form | Δ svarint | Δ varint | varint | varint
|
|
||||||
15 | No location | None | None | None | None
|
|
||||||
|
|
||||||
The Δ means the value is encoded as a delta from another value:
|
|
||||||
* Start line: Delta from the previous start line, or `co_firstlineno` for the first entry.
|
|
||||||
* End line: Delta from the start line
|
|
||||||
|
|
||||||
### The short forms
|
|
||||||
|
|
||||||
Codes 0-9 are the short forms. The short form consists of two bytes, the second byte holding additional column information. The code is the start column divided by 8 (and rounded down).
|
|
||||||
* Start column: `(code*8) + ((second_byte>>4)&7)`
|
|
||||||
* End column: `start_column + (second_byte&15)`
|
|
|
@ -1,7 +1,7 @@
|
||||||
Description of the internal format of the line number table in Python 3.10
|
Description of the internal format of the line number table in Python 3.10
|
||||||
and earlier.
|
and earlier.
|
||||||
|
|
||||||
(For 3.11 onwards, see Objects/locations.md)
|
(For 3.11 onwards, see InternalDocs/code_objects.md)
|
||||||
|
|
||||||
Conceptually, the line number table consists of a sequence of triples:
|
Conceptually, the line number table consists of a sequence of triples:
|
||||||
start-offset (inclusive), end-offset (exclusive), line-number.
|
start-offset (inclusive), end-offset (exclusive), line-number.
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue