mirror of
https://github.com/python/cpython.git
synced 2025-07-07 19:35:27 +00:00
gh-119786: cleanup internal docs and fix internal links (#127485)
This commit is contained in:
parent
1bc4f076d1
commit
04673d2f14
11 changed files with 152 additions and 148 deletions
|
@ -1,4 +1,3 @@
|
|||
|
||||
# CPython Internals Documentation
|
||||
|
||||
The documentation in this folder is intended for CPython maintainers.
|
||||
|
|
|
@ -96,6 +96,7 @@ quality of specialization and keeping the overhead of specialization low.
|
|||
Specialized instructions must be fast. In order to be fast,
|
||||
specialized instructions should be tailored for a particular
|
||||
set of values that allows them to:
|
||||
|
||||
1. Verify that incoming value is part of that set with low overhead.
|
||||
2. Perform the operation quickly.
|
||||
|
||||
|
@ -107,9 +108,11 @@ For example, `LOAD_GLOBAL_MODULE` is specialized for `globals()`
|
|||
dictionaries that have a keys with the expected version.
|
||||
|
||||
This can be tested quickly:
|
||||
|
||||
* `globals->keys->dk_version == expected_version`
|
||||
|
||||
and the operation can be performed quickly:
|
||||
|
||||
* `value = entries[cache->index].me_value;`.
|
||||
|
||||
Because it is impossible to measure the performance of an instruction without
|
||||
|
@ -122,10 +125,11 @@ base instruction.
|
|||
### Implementation of specialized instructions
|
||||
|
||||
In general, specialized instructions should be implemented in two parts:
|
||||
|
||||
1. A sequence of guards, each of the form
|
||||
`DEOPT_IF(guard-condition-is-false, BASE_NAME)`.
|
||||
`DEOPT_IF(guard-condition-is-false, BASE_NAME)`.
|
||||
2. The operation, which should ideally have no branches and
|
||||
a minimum number of dependent memory accesses.
|
||||
a minimum number of dependent memory accesses.
|
||||
|
||||
In practice, the parts may overlap, as data required for guards
|
||||
can be re-used in the operation.
|
||||
|
|
|
@ -32,7 +32,7 @@ Below is a checklist of things that may need to change.
|
|||
[`Include/internal/pycore_ast.h`](../Include/internal/pycore_ast.h) and
|
||||
[`Python/Python-ast.c`](../Python/Python-ast.c).
|
||||
|
||||
* [`Parser/lexer/`](../Parser/lexer/) contains the tokenization code.
|
||||
* [`Parser/lexer/`](../Parser/lexer) contains the tokenization code.
|
||||
This is where you would add a new type of comment or string literal, for example.
|
||||
|
||||
* [`Python/ast.c`](../Python/ast.c) will need changes to validate AST objects
|
||||
|
@ -60,4 +60,4 @@ Below is a checklist of things that may need to change.
|
|||
to the tokenizer.
|
||||
|
||||
* Documentation must be written! Specifically, one or more of the pages in
|
||||
[`Doc/reference/`](../Doc/reference/) will need to be updated.
|
||||
[`Doc/reference/`](../Doc/reference) will need to be updated.
|
||||
|
|
|
@ -1,4 +1,3 @@
|
|||
|
||||
Compiler design
|
||||
===============
|
||||
|
||||
|
@ -7,8 +6,8 @@ Abstract
|
|||
|
||||
In CPython, the compilation from source code to bytecode involves several steps:
|
||||
|
||||
1. Tokenize the source code [Parser/lexer/](../Parser/lexer/)
|
||||
and [Parser/tokenizer/](../Parser/tokenizer/).
|
||||
1. Tokenize the source code [Parser/lexer/](../Parser/lexer)
|
||||
and [Parser/tokenizer/](../Parser/tokenizer).
|
||||
2. Parse the stream of tokens into an Abstract Syntax Tree
|
||||
[Parser/parser.c](../Parser/parser.c).
|
||||
3. Transform AST into an instruction sequence
|
||||
|
@ -134,9 +133,8 @@ this case) a `stmt_ty` struct with the appropriate initialization. The
|
|||
`FunctionDef()` constructor function sets 'kind' to `FunctionDef_kind` and
|
||||
initializes the *name*, *args*, *body*, and *attributes* fields.
|
||||
|
||||
See also
|
||||
[Green Tree Snakes - The missing Python AST docs](https://greentreesnakes.readthedocs.io/en/latest)
|
||||
by Thomas Kluyver.
|
||||
See also [Green Tree Snakes - The missing Python AST docs](
|
||||
https://greentreesnakes.readthedocs.io/en/latest) by Thomas Kluyver.
|
||||
|
||||
Memory management
|
||||
=================
|
||||
|
@ -260,12 +258,12 @@ manually -- `generic`, `identifier` and `int`. These types are found in
|
|||
[Include/internal/pycore_asdl.h](../Include/internal/pycore_asdl.h).
|
||||
Functions and macros for creating `asdl_xx_seq *` types are as follows:
|
||||
|
||||
`_Py_asdl_generic_seq_new(Py_ssize_t, PyArena *)`
|
||||
Allocate memory for an `asdl_generic_seq` of the specified length
|
||||
`_Py_asdl_identifier_seq_new(Py_ssize_t, PyArena *)`
|
||||
Allocate memory for an `asdl_identifier_seq` of the specified length
|
||||
`_Py_asdl_int_seq_new(Py_ssize_t, PyArena *)`
|
||||
Allocate memory for an `asdl_int_seq` of the specified length
|
||||
* `_Py_asdl_generic_seq_new(Py_ssize_t, PyArena *)`:
|
||||
Allocate memory for an `asdl_generic_seq` of the specified length
|
||||
* `_Py_asdl_identifier_seq_new(Py_ssize_t, PyArena *)`:
|
||||
Allocate memory for an `asdl_identifier_seq` of the specified length
|
||||
* `_Py_asdl_int_seq_new(Py_ssize_t, PyArena *)`:
|
||||
Allocate memory for an `asdl_int_seq` of the specified length
|
||||
|
||||
In addition to the three types mentioned above, some ASDL sequence types are
|
||||
automatically generated by [Parser/asdl_c.py](../Parser/asdl_c.py) and found in
|
||||
|
@ -273,20 +271,20 @@ automatically generated by [Parser/asdl_c.py](../Parser/asdl_c.py) and found in
|
|||
Macros for using both manually defined and automatically generated ASDL
|
||||
sequence types are as follows:
|
||||
|
||||
`asdl_seq_GET(asdl_xx_seq *, int)`
|
||||
Get item held at a specific position in an `asdl_xx_seq`
|
||||
`asdl_seq_SET(asdl_xx_seq *, int, stmt_ty)`
|
||||
Set a specific index in an `asdl_xx_seq` to the specified value
|
||||
* `asdl_seq_GET(asdl_xx_seq *, int)`:
|
||||
Get item held at a specific position in an `asdl_xx_seq`
|
||||
* `asdl_seq_SET(asdl_xx_seq *, int, stmt_ty)`:
|
||||
Set a specific index in an `asdl_xx_seq` to the specified value
|
||||
|
||||
Untyped counterparts exist for some of the typed macros. These are useful
|
||||
Untyped counterparts exist for some of the typed macros. These are useful
|
||||
when a function needs to manipulate a generic ASDL sequence:
|
||||
|
||||
`asdl_seq_GET_UNTYPED(asdl_seq *, int)`
|
||||
Get item held at a specific position in an `asdl_seq`
|
||||
`asdl_seq_SET_UNTYPED(asdl_seq *, int, stmt_ty)`
|
||||
Set a specific index in an `asdl_seq` to the specified value
|
||||
`asdl_seq_LEN(asdl_seq *)`
|
||||
Return the length of an `asdl_seq` or `asdl_xx_seq`
|
||||
* `asdl_seq_GET_UNTYPED(asdl_seq *, int)`:
|
||||
Get item held at a specific position in an `asdl_seq`
|
||||
* `asdl_seq_SET_UNTYPED(asdl_seq *, int, stmt_ty)`:
|
||||
Set a specific index in an `asdl_seq` to the specified value
|
||||
* `asdl_seq_LEN(asdl_seq *)`:
|
||||
Return the length of an `asdl_seq` or `asdl_xx_seq`
|
||||
|
||||
Note that typed macros and functions are recommended over their untyped
|
||||
counterparts. Typed macros carry out checks in debug mode and aid
|
||||
|
@ -379,33 +377,33 @@ arguments to a node that used the '*' modifier).
|
|||
|
||||
Emission of bytecode is handled by the following macros:
|
||||
|
||||
* `ADDOP(struct compiler *, location, int)`
|
||||
add a specified opcode
|
||||
* `ADDOP_IN_SCOPE(struct compiler *, location, int)`
|
||||
like `ADDOP`, but also exits current scope; used for adding return value
|
||||
opcodes in lambdas and closures
|
||||
* `ADDOP_I(struct compiler *, location, int, Py_ssize_t)`
|
||||
add an opcode that takes an integer argument
|
||||
* `ADDOP_O(struct compiler *, location, int, PyObject *, TYPE)`
|
||||
add an opcode with the proper argument based on the position of the
|
||||
specified PyObject in PyObject sequence object, but with no handling of
|
||||
mangled names; used for when you
|
||||
need to do named lookups of objects such as globals, consts, or
|
||||
parameters where name mangling is not possible and the scope of the
|
||||
name is known; *TYPE* is the name of PyObject sequence
|
||||
(`names` or `varnames`)
|
||||
* `ADDOP_N(struct compiler *, location, int, PyObject *, TYPE)`
|
||||
just like `ADDOP_O`, but steals a reference to PyObject
|
||||
* `ADDOP_NAME(struct compiler *, location, int, PyObject *, TYPE)`
|
||||
just like `ADDOP_O`, but name mangling is also handled; used for
|
||||
attribute loading or importing based on name
|
||||
* `ADDOP_LOAD_CONST(struct compiler *, location, PyObject *)`
|
||||
add the `LOAD_CONST` opcode with the proper argument based on the
|
||||
position of the specified PyObject in the consts table.
|
||||
* `ADDOP_LOAD_CONST_NEW(struct compiler *, location, PyObject *)`
|
||||
just like `ADDOP_LOAD_CONST_NEW`, but steals a reference to PyObject
|
||||
* `ADDOP_JUMP(struct compiler *, location, int, basicblock *)`
|
||||
create a jump to a basic block
|
||||
* `ADDOP(struct compiler *, location, int)`:
|
||||
add a specified opcode
|
||||
* `ADDOP_IN_SCOPE(struct compiler *, location, int)`:
|
||||
like `ADDOP`, but also exits current scope; used for adding return value
|
||||
opcodes in lambdas and closures
|
||||
* `ADDOP_I(struct compiler *, location, int, Py_ssize_t)`:
|
||||
add an opcode that takes an integer argument
|
||||
* `ADDOP_O(struct compiler *, location, int, PyObject *, TYPE)`:
|
||||
add an opcode with the proper argument based on the position of the
|
||||
specified PyObject in PyObject sequence object, but with no handling of
|
||||
mangled names; used for when you
|
||||
need to do named lookups of objects such as globals, consts, or
|
||||
parameters where name mangling is not possible and the scope of the
|
||||
name is known; *TYPE* is the name of PyObject sequence
|
||||
(`names` or `varnames`)
|
||||
* `ADDOP_N(struct compiler *, location, int, PyObject *, TYPE)`:
|
||||
just like `ADDOP_O`, but steals a reference to PyObject
|
||||
* `ADDOP_NAME(struct compiler *, location, int, PyObject *, TYPE)`:
|
||||
just like `ADDOP_O`, but name mangling is also handled; used for
|
||||
attribute loading or importing based on name
|
||||
* `ADDOP_LOAD_CONST(struct compiler *, location, PyObject *)`:
|
||||
add the `LOAD_CONST` opcode with the proper argument based on the
|
||||
position of the specified PyObject in the consts table.
|
||||
* `ADDOP_LOAD_CONST_NEW(struct compiler *, location, PyObject *)`:
|
||||
just like `ADDOP_LOAD_CONST_NEW`, but steals a reference to PyObject
|
||||
* `ADDOP_JUMP(struct compiler *, location, int, basicblock *)`:
|
||||
create a jump to a basic block
|
||||
|
||||
The `location` argument is a struct with the source location to be
|
||||
associated with this instruction. It is typically extracted from an
|
||||
|
@ -433,7 +431,7 @@ Finally, the sequence of pseudo-instructions is converted into actual
|
|||
bytecode. This includes transforming pseudo instructions into actual instructions,
|
||||
converting jump targets from logical labels to relative offsets, and
|
||||
construction of the [exception table](exception_handling.md) and
|
||||
[locations table](locations.md).
|
||||
[locations table](code_objects.md#source-code-locations).
|
||||
The bytecode and tables are then wrapped into a `PyCodeObject` along with additional
|
||||
metadata, including the `consts` and `names` arrays, information about function
|
||||
reference to the source code (filename, etc). All of this is implemented by
|
||||
|
@ -453,7 +451,7 @@ in [Python/ceval.c](../Python/ceval.c).
|
|||
Important files
|
||||
===============
|
||||
|
||||
* [Parser/](../Parser/)
|
||||
* [Parser/](../Parser)
|
||||
|
||||
* [Parser/Python.asdl](../Parser/Python.asdl):
|
||||
ASDL syntax file.
|
||||
|
@ -534,7 +532,7 @@ Important files
|
|||
* [Python/instruction_sequence.c](../Python/instruction_sequence.c):
|
||||
A data structure representing a sequence of bytecode-like pseudo-instructions.
|
||||
|
||||
* [Include/](../Include/)
|
||||
* [Include/](../Include)
|
||||
|
||||
* [Include/cpython/code.h](../Include/cpython/code.h)
|
||||
: Header file for [Objects/codeobject.c](../Objects/codeobject.c);
|
||||
|
@ -556,7 +554,7 @@ Important files
|
|||
: Declares `_PyAST_Validate()` external (from [Python/ast.c](../Python/ast.c)).
|
||||
|
||||
* [Include/internal/pycore_symtable.h](../Include/internal/pycore_symtable.h)
|
||||
: Header for [Python/symtable.c](../Python/symtable.c).
|
||||
: Header for [Python/symtable.c](../Python/symtable.c).
|
||||
`struct symtable` and `PySTEntryObject` are defined here.
|
||||
|
||||
* [Include/internal/pycore_parser.h](../Include/internal/pycore_parser.h)
|
||||
|
@ -570,7 +568,7 @@ Important files
|
|||
by
|
||||
[Tools/cases_generator/opcode_id_generator.py](../Tools/cases_generator/opcode_id_generator.py).
|
||||
|
||||
* [Objects/](../Objects/)
|
||||
* [Objects/](../Objects)
|
||||
|
||||
* [Objects/codeobject.c](../Objects/codeobject.c)
|
||||
: Contains PyCodeObject-related code.
|
||||
|
@ -579,7 +577,7 @@ Important files
|
|||
: Contains the `frame_setlineno()` function which should determine whether it is allowed
|
||||
to make a jump between two points in a bytecode.
|
||||
|
||||
* [Lib/](../Lib/)
|
||||
* [Lib/](../Lib)
|
||||
|
||||
* [Lib/opcode.py](../Lib/opcode.py)
|
||||
: opcode utilities exposed to Python.
|
||||
|
@ -591,7 +589,7 @@ Important files
|
|||
Objects
|
||||
=======
|
||||
|
||||
* [Locations](locations.md): Describes the location table
|
||||
* [Locations](code_objects.md#source-code-locations): Describes the location table
|
||||
* [Frames](frames.md): Describes frames and the frame stack
|
||||
* [Objects/object_layout.md](../Objects/object_layout.md): Describes object layout for 3.11 and later
|
||||
* [Exception Handling](exception_handling.md): Describes the exception table
|
||||
|
|
|
@ -87,10 +87,10 @@ offset of the raising instruction should be pushed to the stack.
|
|||
Handling an exception, once an exception table entry is found, consists
|
||||
of the following steps:
|
||||
|
||||
1. pop values from the stack until it matches the stack depth for the handler.
|
||||
2. if `lasti` is true, then push the offset that the exception was raised at.
|
||||
3. push the exception to the stack.
|
||||
4. jump to the target offset and resume execution.
|
||||
1. pop values from the stack until it matches the stack depth for the handler.
|
||||
2. if `lasti` is true, then push the offset that the exception was raised at.
|
||||
3. push the exception to the stack.
|
||||
4. jump to the target offset and resume execution.
|
||||
|
||||
|
||||
Reraising Exceptions and `lasti`
|
||||
|
@ -107,13 +107,12 @@ Format of the exception table
|
|||
-----------------------------
|
||||
|
||||
Conceptually, the exception table consists of a sequence of 5-tuples:
|
||||
```
|
||||
1. `start-offset` (inclusive)
|
||||
2. `end-offset` (exclusive)
|
||||
3. `target`
|
||||
4. `stack-depth`
|
||||
5. `push-lasti` (boolean)
|
||||
```
|
||||
|
||||
1. `start-offset` (inclusive)
|
||||
2. `end-offset` (exclusive)
|
||||
3. `target`
|
||||
4. `stack-depth`
|
||||
5. `push-lasti` (boolean)
|
||||
|
||||
All offsets and lengths are in code units, not bytes.
|
||||
|
||||
|
@ -123,18 +122,19 @@ For it to be searchable quickly, we need to support binary search giving us log(
|
|||
Binary search typically assumes fixed size entries, but that is not necessary, as long as we can identify the start of an entry.
|
||||
|
||||
It is worth noting that the size (end-start) is always smaller than the end, so we encode the entries as:
|
||||
`start, size, target, depth, push-lasti`.
|
||||
`start, size, target, depth, push-lasti`.
|
||||
|
||||
Also, sizes are limited to 2**30 as the code length cannot exceed 2**31 and each code unit takes 2 bytes.
|
||||
It also happens that depth is generally quite small.
|
||||
|
||||
So, we need to encode:
|
||||
|
||||
```
|
||||
`start` (up to 30 bits)
|
||||
`size` (up to 30 bits)
|
||||
`target` (up to 30 bits)
|
||||
`depth` (up to ~8 bits)
|
||||
`lasti` (1 bit)
|
||||
start (up to 30 bits)
|
||||
size (up to 30 bits)
|
||||
target (up to 30 bits)
|
||||
depth (up to ~8 bits)
|
||||
lasti (1 bit)
|
||||
```
|
||||
|
||||
We need a marker for the start of the entry, so the first byte of entry will have the most significant bit set.
|
||||
|
@ -145,29 +145,32 @@ The 8 bits of a byte are (msb left) SXdddddd where S is the start bit. X is the
|
|||
In addition, we combine `depth` and `lasti` into a single value, `((depth<<1)+lasti)`, before encoding.
|
||||
|
||||
For example, the exception entry:
|
||||
|
||||
```
|
||||
`start`: 20
|
||||
`end`: 28
|
||||
`target`: 100
|
||||
`depth`: 3
|
||||
`lasti`: False
|
||||
start: 20
|
||||
end: 28
|
||||
target: 100
|
||||
depth: 3
|
||||
lasti: False
|
||||
```
|
||||
|
||||
is encoded by first converting to the more compact four value form:
|
||||
|
||||
```
|
||||
`start`: 20
|
||||
`size`: 8
|
||||
`target`: 100
|
||||
`depth<<1+lasti`: 6
|
||||
start: 20
|
||||
size: 8
|
||||
target: 100
|
||||
depth<<1+lasti: 6
|
||||
```
|
||||
|
||||
which is then encoded as:
|
||||
|
||||
```
|
||||
148 (MSB + 20 for start)
|
||||
8 (size)
|
||||
65 (Extend bit + 1)
|
||||
36 (Remainder of target, 100 == (1<<6)+36)
|
||||
6
|
||||
148 (MSB + 20 for start)
|
||||
8 (size)
|
||||
65 (Extend bit + 1)
|
||||
36 (Remainder of target, 100 == (1<<6)+36)
|
||||
6
|
||||
```
|
||||
|
||||
for a total of five bytes.
|
||||
|
|
|
@ -27,6 +27,7 @@ objects, so are not allocated in the per-thread stack. See `PyGenObject` in
|
|||
## Layout
|
||||
|
||||
Each activation record is laid out as:
|
||||
|
||||
* Specials
|
||||
* Locals
|
||||
* Stack
|
||||
|
|
|
@ -1,4 +1,3 @@
|
|||
|
||||
Garbage collector design
|
||||
========================
|
||||
|
||||
|
@ -117,7 +116,7 @@ general, the collection of all objects tracked by GC is partitioned into disjoin
|
|||
doubly linked list. Between collections, objects are partitioned into "generations", reflecting how
|
||||
often they've survived collection attempts. During collections, the generation(s) being collected
|
||||
are further partitioned into, for example, sets of reachable and unreachable objects. Doubly linked lists
|
||||
support moving an object from one partition to another, adding a new object, removing an object
|
||||
support moving an object from one partition to another, adding a new object, removing an object
|
||||
entirely (objects tracked by GC are most often reclaimed by the refcounting system when GC
|
||||
isn't running at all!), and merging partitions, all with a small constant number of pointer updates.
|
||||
With care, they also support iterating over a partition while objects are being added to - and
|
||||
|
|
|
@ -1,4 +1,3 @@
|
|||
|
||||
Generators
|
||||
==========
|
||||
|
||||
|
|
|
@ -1,4 +1,3 @@
|
|||
|
||||
The bytecode interpreter
|
||||
========================
|
||||
|
||||
|
|
|
@ -1,4 +1,3 @@
|
|||
|
||||
Guide to the parser
|
||||
===================
|
||||
|
||||
|
@ -444,15 +443,15 @@ How to regenerate the parser
|
|||
Once you have made the changes to the grammar files, to regenerate the `C`
|
||||
parser (the one used by the interpreter) just execute:
|
||||
|
||||
```
|
||||
make regen-pegen
|
||||
```shell
|
||||
$ make regen-pegen
|
||||
```
|
||||
|
||||
using the `Makefile` in the main directory. If you are on Windows you can
|
||||
use the Visual Studio project files to regenerate the parser or to execute:
|
||||
|
||||
```
|
||||
./PCbuild/build.bat --regen
|
||||
```dos
|
||||
PCbuild/build.bat --regen
|
||||
```
|
||||
|
||||
The generated parser file is located at [`Parser/parser.c`](../Parser/parser.c).
|
||||
|
@ -468,15 +467,15 @@ any modifications to this file (in order to implement new Pegen features) you wi
|
|||
need to regenerate the meta-parser (the parser that parses the grammar files).
|
||||
To do so just execute:
|
||||
|
||||
```
|
||||
make regen-pegen-metaparser
|
||||
```shell
|
||||
$ make regen-pegen-metaparser
|
||||
```
|
||||
|
||||
If you are on Windows you can use the Visual Studio project files
|
||||
to regenerate the parser or to execute:
|
||||
|
||||
```
|
||||
./PCbuild/build.bat --regen
|
||||
```dos
|
||||
PCbuild/build.bat --regen
|
||||
```
|
||||
|
||||
|
||||
|
@ -516,15 +515,15 @@ be found in the [`Grammar/Tokens`](../Grammar/Tokens)
|
|||
file. If you change this file to add new tokens, make sure to regenerate the
|
||||
files by executing:
|
||||
|
||||
```
|
||||
make regen-token
|
||||
```shell
|
||||
$ make regen-token
|
||||
```
|
||||
|
||||
If you are on Windows you can use the Visual Studio project files to regenerate
|
||||
the tokens or to execute:
|
||||
|
||||
```
|
||||
./PCbuild/build.bat --regen
|
||||
```dos
|
||||
PCbuild/build.bat --regen
|
||||
```
|
||||
|
||||
How tokens are generated and the rules governing this are completely up to the tokenizer
|
||||
|
@ -546,8 +545,8 @@ by default** except for rules with the special marker `memo` after the rule
|
|||
name (and type, if present):
|
||||
|
||||
```
|
||||
rule_name[typr] (memo):
|
||||
...
|
||||
rule_name[typr] (memo):
|
||||
...
|
||||
```
|
||||
|
||||
By selectively turning on memoization for a handful of rules, the parser becomes
|
||||
|
@ -593,25 +592,25 @@ are always reserved words, even in positions where they make no sense
|
|||
meaning in context. Trying to use a hard keyword as a variable will always
|
||||
fail:
|
||||
|
||||
```
|
||||
>>> class = 3
|
||||
File "<stdin>", line 1
|
||||
class = 3
|
||||
^
|
||||
SyntaxError: invalid syntax
|
||||
>>> foo(class=3)
|
||||
File "<stdin>", line 1
|
||||
foo(class=3)
|
||||
^^^^^
|
||||
SyntaxError: invalid syntax
|
||||
```pycon
|
||||
>>> class = 3
|
||||
File "<stdin>", line 1
|
||||
class = 3
|
||||
^
|
||||
SyntaxError: invalid syntax
|
||||
>>> foo(class=3)
|
||||
File "<stdin>", line 1
|
||||
foo(class=3)
|
||||
^^^^^
|
||||
SyntaxError: invalid syntax
|
||||
```
|
||||
|
||||
While soft keywords don't have this limitation if used in a context other the
|
||||
one where they are defined as keywords:
|
||||
|
||||
```
|
||||
>>> match = 45
|
||||
>>> foo(match="Yeah!")
|
||||
```pycon
|
||||
>>> match = 45
|
||||
>>> foo(match="Yeah!")
|
||||
```
|
||||
|
||||
The `match` and `case` keywords are soft keywords, so that they are
|
||||
|
@ -621,21 +620,21 @@ argument names.
|
|||
|
||||
You can get a list of all keywords defined in the grammar from Python:
|
||||
|
||||
```
|
||||
>>> import keyword
|
||||
>>> keyword.kwlist
|
||||
['False', 'None', 'True', 'and', 'as', 'assert', 'async', 'await', 'break',
|
||||
'class', 'continue', 'def', 'del', 'elif', 'else', 'except', 'finally', 'for',
|
||||
'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'nonlocal', 'not', 'or',
|
||||
'pass', 'raise', 'return', 'try', 'while', 'with', 'yield']
|
||||
```pycon
|
||||
>>> import keyword
|
||||
>>> keyword.kwlist
|
||||
['False', 'None', 'True', 'and', 'as', 'assert', 'async', 'await', 'break',
|
||||
'class', 'continue', 'def', 'del', 'elif', 'else', 'except', 'finally', 'for',
|
||||
'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'nonlocal', 'not', 'or',
|
||||
'pass', 'raise', 'return', 'try', 'while', 'with', 'yield']
|
||||
```
|
||||
|
||||
as well as soft keywords:
|
||||
|
||||
```
|
||||
>>> import keyword
|
||||
>>> keyword.softkwlist
|
||||
['_', 'case', 'match']
|
||||
```pycon
|
||||
>>> import keyword
|
||||
>>> keyword.softkwlist
|
||||
['_', 'case', 'match']
|
||||
```
|
||||
|
||||
> [!CAUTION]
|
||||
|
@ -736,7 +735,7 @@ displayed when the error is reported.
|
|||
> rule or not. For example:
|
||||
|
||||
```
|
||||
<valid python code> $ 42
|
||||
<valid python code> $ 42
|
||||
```
|
||||
|
||||
should trigger the syntax error in the `$` character. If your rule is not correctly defined this
|
||||
|
@ -744,7 +743,7 @@ won't happen. As another example, suppose that you try to define a rule to match
|
|||
`print` statements in order to create a better error message and you define it as:
|
||||
|
||||
```
|
||||
invalid_print: "print" expression
|
||||
invalid_print: "print" expression
|
||||
```
|
||||
|
||||
This will **seem** to work because the parser will correctly parse `print(something)` because it is valid
|
||||
|
@ -756,7 +755,7 @@ will be reported there instead of the `$` character.
|
|||
Generating AST objects
|
||||
----------------------
|
||||
|
||||
The output of the C parser used by CPython, which is generated from the
|
||||
The output of the C parser used by CPython, which is generated from the
|
||||
[grammar file](../Grammar/python.gram), is a Python AST object (using C
|
||||
structures). This means that the actions in the grammar file generate AST
|
||||
objects when they succeed. Constructing these objects can be quite cumbersome
|
||||
|
@ -798,7 +797,7 @@ Check the contents of these files to know which is the best place for new
|
|||
tests, depending on the nature of the new feature you are adding.
|
||||
|
||||
Tests for the parser generator itself can be found in the
|
||||
[test_peg_generator](../Lib/test_peg_generator) directory.
|
||||
[test_peg_generator](../Lib/test/test_peg_generator) directory.
|
||||
|
||||
|
||||
Debugging generated parsers
|
||||
|
@ -816,15 +815,15 @@ For this reason it is a good idea to experiment first by generating a Python
|
|||
parser. To do this, you can go to the [Tools/peg_generator](../Tools/peg_generator)
|
||||
directory on the CPython repository and manually call the parser generator by executing:
|
||||
|
||||
```
|
||||
$ python -m pegen python <PATH TO YOUR GRAMMAR FILE>
|
||||
```shell
|
||||
$ python -m pegen python <PATH TO YOUR GRAMMAR FILE>
|
||||
```
|
||||
|
||||
This will generate a file called `parse.py` in the same directory that you
|
||||
can use to parse some input:
|
||||
|
||||
```
|
||||
$ python parse.py file_with_source_code_to_test.py
|
||||
```shell
|
||||
$ python parse.py file_with_source_code_to_test.py
|
||||
```
|
||||
|
||||
As the generated `parse.py` file is just Python code, you can modify it
|
||||
|
@ -848,8 +847,8 @@ can be a bit hard to understand at first.
|
|||
|
||||
To activate verbose mode you can add the `-d` flag when executing Python:
|
||||
|
||||
```
|
||||
$ python -d file_to_test.py
|
||||
```shell
|
||||
$ python -d file_to_test.py
|
||||
```
|
||||
|
||||
This will print **a lot** of output to `stderr` so it is probably better to dump
|
||||
|
@ -857,7 +856,7 @@ it to a file for further analysis. The output consists of trace lines with the
|
|||
following structure::
|
||||
|
||||
```
|
||||
<indentation> ('>'|'-'|'+'|'!') <rule_name>[<token_location>]: <alternative> ...
|
||||
<indentation> ('>'|'-'|'+'|'!') <rule_name>[<token_location>]: <alternative> ...
|
||||
```
|
||||
|
||||
Every line is indented by a different amount (`<indentation>`) depending on how
|
||||
|
|
|
@ -2,6 +2,7 @@
|
|||
|
||||
*Interned* strings are conceptually part of an interpreter-global
|
||||
*set* of interned strings, meaning that:
|
||||
|
||||
- no two interned strings have the same content (across an interpreter);
|
||||
- two interned strings can be safely compared using pointer equality
|
||||
(Python `is`).
|
||||
|
@ -61,6 +62,7 @@ if it's interned and mortal it needs extra processing in
|
|||
|
||||
The converse is not true: interned strings can be mortal.
|
||||
For mortal interned strings:
|
||||
|
||||
- the 2 references from the interned dict (key & value) are excluded from
|
||||
their refcount
|
||||
- the deallocator (`unicode_dealloc`) removes the string from the interned dict
|
||||
|
@ -90,6 +92,7 @@ modify in place.
|
|||
The functions take ownership of (“steal”) the reference to their argument,
|
||||
and update the argument with a *new* reference.
|
||||
This means:
|
||||
|
||||
- They're “reference neutral”.
|
||||
- They must not be called with a borrowed reference.
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue