mirror of
https://github.com/python/cpython.git
synced 2025-07-07 19:35:27 +00:00
gh-119786: cleanup internal docs and fix internal links (#127485)
This commit is contained in:
parent
1bc4f076d1
commit
04673d2f14
11 changed files with 152 additions and 148 deletions
|
@ -1,4 +1,3 @@
|
|||
|
||||
# CPython Internals Documentation
|
||||
|
||||
The documentation in this folder is intended for CPython maintainers.
|
||||
|
|
|
@ -96,6 +96,7 @@ quality of specialization and keeping the overhead of specialization low.
|
|||
Specialized instructions must be fast. In order to be fast,
|
||||
specialized instructions should be tailored for a particular
|
||||
set of values that allows them to:
|
||||
|
||||
1. Verify that incoming value is part of that set with low overhead.
|
||||
2. Perform the operation quickly.
|
||||
|
||||
|
@ -107,9 +108,11 @@ For example, `LOAD_GLOBAL_MODULE` is specialized for `globals()`
|
|||
dictionaries that have a keys with the expected version.
|
||||
|
||||
This can be tested quickly:
|
||||
|
||||
* `globals->keys->dk_version == expected_version`
|
||||
|
||||
and the operation can be performed quickly:
|
||||
|
||||
* `value = entries[cache->index].me_value;`.
|
||||
|
||||
Because it is impossible to measure the performance of an instruction without
|
||||
|
@ -122,6 +125,7 @@ base instruction.
|
|||
### Implementation of specialized instructions
|
||||
|
||||
In general, specialized instructions should be implemented in two parts:
|
||||
|
||||
1. A sequence of guards, each of the form
|
||||
`DEOPT_IF(guard-condition-is-false, BASE_NAME)`.
|
||||
2. The operation, which should ideally have no branches and
|
||||
|
|
|
@ -32,7 +32,7 @@ Below is a checklist of things that may need to change.
|
|||
[`Include/internal/pycore_ast.h`](../Include/internal/pycore_ast.h) and
|
||||
[`Python/Python-ast.c`](../Python/Python-ast.c).
|
||||
|
||||
* [`Parser/lexer/`](../Parser/lexer/) contains the tokenization code.
|
||||
* [`Parser/lexer/`](../Parser/lexer) contains the tokenization code.
|
||||
This is where you would add a new type of comment or string literal, for example.
|
||||
|
||||
* [`Python/ast.c`](../Python/ast.c) will need changes to validate AST objects
|
||||
|
@ -60,4 +60,4 @@ Below is a checklist of things that may need to change.
|
|||
to the tokenizer.
|
||||
|
||||
* Documentation must be written! Specifically, one or more of the pages in
|
||||
[`Doc/reference/`](../Doc/reference/) will need to be updated.
|
||||
[`Doc/reference/`](../Doc/reference) will need to be updated.
|
||||
|
|
|
@ -1,4 +1,3 @@
|
|||
|
||||
Compiler design
|
||||
===============
|
||||
|
||||
|
@ -7,8 +6,8 @@ Abstract
|
|||
|
||||
In CPython, the compilation from source code to bytecode involves several steps:
|
||||
|
||||
1. Tokenize the source code [Parser/lexer/](../Parser/lexer/)
|
||||
and [Parser/tokenizer/](../Parser/tokenizer/).
|
||||
1. Tokenize the source code [Parser/lexer/](../Parser/lexer)
|
||||
and [Parser/tokenizer/](../Parser/tokenizer).
|
||||
2. Parse the stream of tokens into an Abstract Syntax Tree
|
||||
[Parser/parser.c](../Parser/parser.c).
|
||||
3. Transform AST into an instruction sequence
|
||||
|
@ -134,9 +133,8 @@ this case) a `stmt_ty` struct with the appropriate initialization. The
|
|||
`FunctionDef()` constructor function sets 'kind' to `FunctionDef_kind` and
|
||||
initializes the *name*, *args*, *body*, and *attributes* fields.
|
||||
|
||||
See also
|
||||
[Green Tree Snakes - The missing Python AST docs](https://greentreesnakes.readthedocs.io/en/latest)
|
||||
by Thomas Kluyver.
|
||||
See also [Green Tree Snakes - The missing Python AST docs](
|
||||
https://greentreesnakes.readthedocs.io/en/latest) by Thomas Kluyver.
|
||||
|
||||
Memory management
|
||||
=================
|
||||
|
@ -260,11 +258,11 @@ manually -- `generic`, `identifier` and `int`. These types are found in
|
|||
[Include/internal/pycore_asdl.h](../Include/internal/pycore_asdl.h).
|
||||
Functions and macros for creating `asdl_xx_seq *` types are as follows:
|
||||
|
||||
`_Py_asdl_generic_seq_new(Py_ssize_t, PyArena *)`
|
||||
* `_Py_asdl_generic_seq_new(Py_ssize_t, PyArena *)`:
|
||||
Allocate memory for an `asdl_generic_seq` of the specified length
|
||||
`_Py_asdl_identifier_seq_new(Py_ssize_t, PyArena *)`
|
||||
* `_Py_asdl_identifier_seq_new(Py_ssize_t, PyArena *)`:
|
||||
Allocate memory for an `asdl_identifier_seq` of the specified length
|
||||
`_Py_asdl_int_seq_new(Py_ssize_t, PyArena *)`
|
||||
* `_Py_asdl_int_seq_new(Py_ssize_t, PyArena *)`:
|
||||
Allocate memory for an `asdl_int_seq` of the specified length
|
||||
|
||||
In addition to the three types mentioned above, some ASDL sequence types are
|
||||
|
@ -273,19 +271,19 @@ automatically generated by [Parser/asdl_c.py](../Parser/asdl_c.py) and found in
|
|||
Macros for using both manually defined and automatically generated ASDL
|
||||
sequence types are as follows:
|
||||
|
||||
`asdl_seq_GET(asdl_xx_seq *, int)`
|
||||
* `asdl_seq_GET(asdl_xx_seq *, int)`:
|
||||
Get item held at a specific position in an `asdl_xx_seq`
|
||||
`asdl_seq_SET(asdl_xx_seq *, int, stmt_ty)`
|
||||
* `asdl_seq_SET(asdl_xx_seq *, int, stmt_ty)`:
|
||||
Set a specific index in an `asdl_xx_seq` to the specified value
|
||||
|
||||
Untyped counterparts exist for some of the typed macros. These are useful
|
||||
when a function needs to manipulate a generic ASDL sequence:
|
||||
|
||||
`asdl_seq_GET_UNTYPED(asdl_seq *, int)`
|
||||
* `asdl_seq_GET_UNTYPED(asdl_seq *, int)`:
|
||||
Get item held at a specific position in an `asdl_seq`
|
||||
`asdl_seq_SET_UNTYPED(asdl_seq *, int, stmt_ty)`
|
||||
* `asdl_seq_SET_UNTYPED(asdl_seq *, int, stmt_ty)`:
|
||||
Set a specific index in an `asdl_seq` to the specified value
|
||||
`asdl_seq_LEN(asdl_seq *)`
|
||||
* `asdl_seq_LEN(asdl_seq *)`:
|
||||
Return the length of an `asdl_seq` or `asdl_xx_seq`
|
||||
|
||||
Note that typed macros and functions are recommended over their untyped
|
||||
|
@ -379,14 +377,14 @@ arguments to a node that used the '*' modifier).
|
|||
|
||||
Emission of bytecode is handled by the following macros:
|
||||
|
||||
* `ADDOP(struct compiler *, location, int)`
|
||||
* `ADDOP(struct compiler *, location, int)`:
|
||||
add a specified opcode
|
||||
* `ADDOP_IN_SCOPE(struct compiler *, location, int)`
|
||||
* `ADDOP_IN_SCOPE(struct compiler *, location, int)`:
|
||||
like `ADDOP`, but also exits current scope; used for adding return value
|
||||
opcodes in lambdas and closures
|
||||
* `ADDOP_I(struct compiler *, location, int, Py_ssize_t)`
|
||||
* `ADDOP_I(struct compiler *, location, int, Py_ssize_t)`:
|
||||
add an opcode that takes an integer argument
|
||||
* `ADDOP_O(struct compiler *, location, int, PyObject *, TYPE)`
|
||||
* `ADDOP_O(struct compiler *, location, int, PyObject *, TYPE)`:
|
||||
add an opcode with the proper argument based on the position of the
|
||||
specified PyObject in PyObject sequence object, but with no handling of
|
||||
mangled names; used for when you
|
||||
|
@ -394,17 +392,17 @@ Emission of bytecode is handled by the following macros:
|
|||
parameters where name mangling is not possible and the scope of the
|
||||
name is known; *TYPE* is the name of PyObject sequence
|
||||
(`names` or `varnames`)
|
||||
* `ADDOP_N(struct compiler *, location, int, PyObject *, TYPE)`
|
||||
* `ADDOP_N(struct compiler *, location, int, PyObject *, TYPE)`:
|
||||
just like `ADDOP_O`, but steals a reference to PyObject
|
||||
* `ADDOP_NAME(struct compiler *, location, int, PyObject *, TYPE)`
|
||||
* `ADDOP_NAME(struct compiler *, location, int, PyObject *, TYPE)`:
|
||||
just like `ADDOP_O`, but name mangling is also handled; used for
|
||||
attribute loading or importing based on name
|
||||
* `ADDOP_LOAD_CONST(struct compiler *, location, PyObject *)`
|
||||
* `ADDOP_LOAD_CONST(struct compiler *, location, PyObject *)`:
|
||||
add the `LOAD_CONST` opcode with the proper argument based on the
|
||||
position of the specified PyObject in the consts table.
|
||||
* `ADDOP_LOAD_CONST_NEW(struct compiler *, location, PyObject *)`
|
||||
* `ADDOP_LOAD_CONST_NEW(struct compiler *, location, PyObject *)`:
|
||||
just like `ADDOP_LOAD_CONST_NEW`, but steals a reference to PyObject
|
||||
* `ADDOP_JUMP(struct compiler *, location, int, basicblock *)`
|
||||
* `ADDOP_JUMP(struct compiler *, location, int, basicblock *)`:
|
||||
create a jump to a basic block
|
||||
|
||||
The `location` argument is a struct with the source location to be
|
||||
|
@ -433,7 +431,7 @@ Finally, the sequence of pseudo-instructions is converted into actual
|
|||
bytecode. This includes transforming pseudo instructions into actual instructions,
|
||||
converting jump targets from logical labels to relative offsets, and
|
||||
construction of the [exception table](exception_handling.md) and
|
||||
[locations table](locations.md).
|
||||
[locations table](code_objects.md#source-code-locations).
|
||||
The bytecode and tables are then wrapped into a `PyCodeObject` along with additional
|
||||
metadata, including the `consts` and `names` arrays, information about function
|
||||
reference to the source code (filename, etc). All of this is implemented by
|
||||
|
@ -453,7 +451,7 @@ in [Python/ceval.c](../Python/ceval.c).
|
|||
Important files
|
||||
===============
|
||||
|
||||
* [Parser/](../Parser/)
|
||||
* [Parser/](../Parser)
|
||||
|
||||
* [Parser/Python.asdl](../Parser/Python.asdl):
|
||||
ASDL syntax file.
|
||||
|
@ -534,7 +532,7 @@ Important files
|
|||
* [Python/instruction_sequence.c](../Python/instruction_sequence.c):
|
||||
A data structure representing a sequence of bytecode-like pseudo-instructions.
|
||||
|
||||
* [Include/](../Include/)
|
||||
* [Include/](../Include)
|
||||
|
||||
* [Include/cpython/code.h](../Include/cpython/code.h)
|
||||
: Header file for [Objects/codeobject.c](../Objects/codeobject.c);
|
||||
|
@ -570,7 +568,7 @@ Important files
|
|||
by
|
||||
[Tools/cases_generator/opcode_id_generator.py](../Tools/cases_generator/opcode_id_generator.py).
|
||||
|
||||
* [Objects/](../Objects/)
|
||||
* [Objects/](../Objects)
|
||||
|
||||
* [Objects/codeobject.c](../Objects/codeobject.c)
|
||||
: Contains PyCodeObject-related code.
|
||||
|
@ -579,7 +577,7 @@ Important files
|
|||
: Contains the `frame_setlineno()` function which should determine whether it is allowed
|
||||
to make a jump between two points in a bytecode.
|
||||
|
||||
* [Lib/](../Lib/)
|
||||
* [Lib/](../Lib)
|
||||
|
||||
* [Lib/opcode.py](../Lib/opcode.py)
|
||||
: opcode utilities exposed to Python.
|
||||
|
@ -591,7 +589,7 @@ Important files
|
|||
Objects
|
||||
=======
|
||||
|
||||
* [Locations](locations.md): Describes the location table
|
||||
* [Locations](code_objects.md#source-code-locations): Describes the location table
|
||||
* [Frames](frames.md): Describes frames and the frame stack
|
||||
* [Objects/object_layout.md](../Objects/object_layout.md): Describes object layout for 3.11 and later
|
||||
* [Exception Handling](exception_handling.md): Describes the exception table
|
||||
|
|
|
@ -107,13 +107,12 @@ Format of the exception table
|
|||
-----------------------------
|
||||
|
||||
Conceptually, the exception table consists of a sequence of 5-tuples:
|
||||
```
|
||||
|
||||
1. `start-offset` (inclusive)
|
||||
2. `end-offset` (exclusive)
|
||||
3. `target`
|
||||
4. `stack-depth`
|
||||
5. `push-lasti` (boolean)
|
||||
```
|
||||
|
||||
All offsets and lengths are in code units, not bytes.
|
||||
|
||||
|
@ -129,12 +128,13 @@ Also, sizes are limited to 2**30 as the code length cannot exceed 2**31 and each
|
|||
It also happens that depth is generally quite small.
|
||||
|
||||
So, we need to encode:
|
||||
|
||||
```
|
||||
`start` (up to 30 bits)
|
||||
`size` (up to 30 bits)
|
||||
`target` (up to 30 bits)
|
||||
`depth` (up to ~8 bits)
|
||||
`lasti` (1 bit)
|
||||
start (up to 30 bits)
|
||||
size (up to 30 bits)
|
||||
target (up to 30 bits)
|
||||
depth (up to ~8 bits)
|
||||
lasti (1 bit)
|
||||
```
|
||||
|
||||
We need a marker for the start of the entry, so the first byte of entry will have the most significant bit set.
|
||||
|
@ -145,23 +145,26 @@ The 8 bits of a byte are (msb left) SXdddddd where S is the start bit. X is the
|
|||
In addition, we combine `depth` and `lasti` into a single value, `((depth<<1)+lasti)`, before encoding.
|
||||
|
||||
For example, the exception entry:
|
||||
|
||||
```
|
||||
`start`: 20
|
||||
`end`: 28
|
||||
`target`: 100
|
||||
`depth`: 3
|
||||
`lasti`: False
|
||||
start: 20
|
||||
end: 28
|
||||
target: 100
|
||||
depth: 3
|
||||
lasti: False
|
||||
```
|
||||
|
||||
is encoded by first converting to the more compact four value form:
|
||||
|
||||
```
|
||||
`start`: 20
|
||||
`size`: 8
|
||||
`target`: 100
|
||||
`depth<<1+lasti`: 6
|
||||
start: 20
|
||||
size: 8
|
||||
target: 100
|
||||
depth<<1+lasti: 6
|
||||
```
|
||||
|
||||
which is then encoded as:
|
||||
|
||||
```
|
||||
148 (MSB + 20 for start)
|
||||
8 (size)
|
||||
|
|
|
@ -27,6 +27,7 @@ objects, so are not allocated in the per-thread stack. See `PyGenObject` in
|
|||
## Layout
|
||||
|
||||
Each activation record is laid out as:
|
||||
|
||||
* Specials
|
||||
* Locals
|
||||
* Stack
|
||||
|
|
|
@ -1,4 +1,3 @@
|
|||
|
||||
Garbage collector design
|
||||
========================
|
||||
|
||||
|
|
|
@ -1,4 +1,3 @@
|
|||
|
||||
Generators
|
||||
==========
|
||||
|
||||
|
|
|
@ -1,4 +1,3 @@
|
|||
|
||||
The bytecode interpreter
|
||||
========================
|
||||
|
||||
|
|
|
@ -1,4 +1,3 @@
|
|||
|
||||
Guide to the parser
|
||||
===================
|
||||
|
||||
|
@ -444,15 +443,15 @@ How to regenerate the parser
|
|||
Once you have made the changes to the grammar files, to regenerate the `C`
|
||||
parser (the one used by the interpreter) just execute:
|
||||
|
||||
```
|
||||
make regen-pegen
|
||||
```shell
|
||||
$ make regen-pegen
|
||||
```
|
||||
|
||||
using the `Makefile` in the main directory. If you are on Windows you can
|
||||
use the Visual Studio project files to regenerate the parser or to execute:
|
||||
|
||||
```
|
||||
./PCbuild/build.bat --regen
|
||||
```dos
|
||||
PCbuild/build.bat --regen
|
||||
```
|
||||
|
||||
The generated parser file is located at [`Parser/parser.c`](../Parser/parser.c).
|
||||
|
@ -468,15 +467,15 @@ any modifications to this file (in order to implement new Pegen features) you wi
|
|||
need to regenerate the meta-parser (the parser that parses the grammar files).
|
||||
To do so just execute:
|
||||
|
||||
```
|
||||
make regen-pegen-metaparser
|
||||
```shell
|
||||
$ make regen-pegen-metaparser
|
||||
```
|
||||
|
||||
If you are on Windows you can use the Visual Studio project files
|
||||
to regenerate the parser or to execute:
|
||||
|
||||
```
|
||||
./PCbuild/build.bat --regen
|
||||
```dos
|
||||
PCbuild/build.bat --regen
|
||||
```
|
||||
|
||||
|
||||
|
@ -516,15 +515,15 @@ be found in the [`Grammar/Tokens`](../Grammar/Tokens)
|
|||
file. If you change this file to add new tokens, make sure to regenerate the
|
||||
files by executing:
|
||||
|
||||
```
|
||||
make regen-token
|
||||
```shell
|
||||
$ make regen-token
|
||||
```
|
||||
|
||||
If you are on Windows you can use the Visual Studio project files to regenerate
|
||||
the tokens or to execute:
|
||||
|
||||
```
|
||||
./PCbuild/build.bat --regen
|
||||
```dos
|
||||
PCbuild/build.bat --regen
|
||||
```
|
||||
|
||||
How tokens are generated and the rules governing this are completely up to the tokenizer
|
||||
|
@ -593,7 +592,7 @@ are always reserved words, even in positions where they make no sense
|
|||
meaning in context. Trying to use a hard keyword as a variable will always
|
||||
fail:
|
||||
|
||||
```
|
||||
```pycon
|
||||
>>> class = 3
|
||||
File "<stdin>", line 1
|
||||
class = 3
|
||||
|
@ -609,7 +608,7 @@ fail:
|
|||
While soft keywords don't have this limitation if used in a context other the
|
||||
one where they are defined as keywords:
|
||||
|
||||
```
|
||||
```pycon
|
||||
>>> match = 45
|
||||
>>> foo(match="Yeah!")
|
||||
```
|
||||
|
@ -621,7 +620,7 @@ argument names.
|
|||
|
||||
You can get a list of all keywords defined in the grammar from Python:
|
||||
|
||||
```
|
||||
```pycon
|
||||
>>> import keyword
|
||||
>>> keyword.kwlist
|
||||
['False', 'None', 'True', 'and', 'as', 'assert', 'async', 'await', 'break',
|
||||
|
@ -632,7 +631,7 @@ You can get a list of all keywords defined in the grammar from Python:
|
|||
|
||||
as well as soft keywords:
|
||||
|
||||
```
|
||||
```pycon
|
||||
>>> import keyword
|
||||
>>> keyword.softkwlist
|
||||
['_', 'case', 'match']
|
||||
|
@ -798,7 +797,7 @@ Check the contents of these files to know which is the best place for new
|
|||
tests, depending on the nature of the new feature you are adding.
|
||||
|
||||
Tests for the parser generator itself can be found in the
|
||||
[test_peg_generator](../Lib/test_peg_generator) directory.
|
||||
[test_peg_generator](../Lib/test/test_peg_generator) directory.
|
||||
|
||||
|
||||
Debugging generated parsers
|
||||
|
@ -816,14 +815,14 @@ For this reason it is a good idea to experiment first by generating a Python
|
|||
parser. To do this, you can go to the [Tools/peg_generator](../Tools/peg_generator)
|
||||
directory on the CPython repository and manually call the parser generator by executing:
|
||||
|
||||
```
|
||||
```shell
|
||||
$ python -m pegen python <PATH TO YOUR GRAMMAR FILE>
|
||||
```
|
||||
|
||||
This will generate a file called `parse.py` in the same directory that you
|
||||
can use to parse some input:
|
||||
|
||||
```
|
||||
```shell
|
||||
$ python parse.py file_with_source_code_to_test.py
|
||||
```
|
||||
|
||||
|
@ -848,7 +847,7 @@ can be a bit hard to understand at first.
|
|||
|
||||
To activate verbose mode you can add the `-d` flag when executing Python:
|
||||
|
||||
```
|
||||
```shell
|
||||
$ python -d file_to_test.py
|
||||
```
|
||||
|
||||
|
|
|
@ -2,6 +2,7 @@
|
|||
|
||||
*Interned* strings are conceptually part of an interpreter-global
|
||||
*set* of interned strings, meaning that:
|
||||
|
||||
- no two interned strings have the same content (across an interpreter);
|
||||
- two interned strings can be safely compared using pointer equality
|
||||
(Python `is`).
|
||||
|
@ -61,6 +62,7 @@ if it's interned and mortal it needs extra processing in
|
|||
|
||||
The converse is not true: interned strings can be mortal.
|
||||
For mortal interned strings:
|
||||
|
||||
- the 2 references from the interned dict (key & value) are excluded from
|
||||
their refcount
|
||||
- the deallocator (`unicode_dealloc`) removes the string from the interned dict
|
||||
|
@ -90,6 +92,7 @@ modify in place.
|
|||
The functions take ownership of (“steal”) the reference to their argument,
|
||||
and update the argument with a *new* reference.
|
||||
This means:
|
||||
|
||||
- They're “reference neutral”.
|
||||
- They must not be called with a borrowed reference.
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue