mirror of https://github.com/python/cpython.git synced 2025-11-23 03:56:04 +00:00

History

mpage 821759d631 gh-126211: Exclude preprocessor directives from statements containing escaping calls (#126213 ) The cases generator inserts code to save and restore the stack pointer around statements that contain escaping calls. To find the beginning of such statements, we would walk backwards from the escaping call until we encountered a token that was treated as a statement terminator. This set of terminators should include preprocessor directives.		2024-11-01 08:53:03 -07:00
..
_typing_backports.py	gh-104504: cases generator: Add `--warn-unreachable` to the mypy config (#108112 )	2023-08-21 00:40:41 +01:00
analyzer.py	gh-126211: Exclude preprocessor directives from statements containing escaping calls (#126213 )	2024-11-01 08:53:03 -07:00
cwriter.py	GH-119866: Spill the stack around escaping calls. (GH-124392)	2024-10-07 14:56:39 +01:00
generators_common.py	gh-118423: Add `INSTRUCTION_SIZE` macro to code generator (GH-125467)	2024-10-29 17:25:05 +00:00
interpreter_definition.md	gh-118423: Add `INSTRUCTION_SIZE` macro to code generator (GH-125467)	2024-10-29 17:25:05 +00:00
lexer.py	GH-119866: Spill the stack around escaping calls. (GH-124392)	2024-10-07 14:56:39 +01:00
mypy.ini	GH-111485: Separate out parsing, analysis and code-gen phases of tier 1 code generator (GH-112299)	2023-12-07 12:49:40 +00:00
opcode_id_generator.py	GH-122390: Replace `_Py_GetbaseOpcode` with `_Py_GetBaseCodeUnit` (GH-122942)	2024-08-13 14:22:57 +01:00
opcode_metadata_generator.py	gh-124285: Fix bug where bool() is called multiple times for the same part of a boolean expression (#124394 )	2024-09-25 15:51:25 +01:00
optimizer_generator.py	GH-119866: Spill the stack around escaping calls. (GH-124392)	2024-10-07 14:56:39 +01:00
parser.py	gh-120417: Remove unused imports in cases_generator (#120622 )	2024-06-17 21:58:56 +02:00
parsing.py	gh-124285: Fix bug where bool() is called multiple times for the same part of a boolean expression (#124394 )	2024-09-25 15:51:25 +01:00
plexer.py	gh-106812: Refactor cases_generator to allow uops with array stack effects (#107564 )	2023-08-04 09:35:56 -07:00
py_metadata_generator.py	GH-120024: Tidy up case generator code a bit. (GH-122780)	2024-08-08 10:57:59 +01:00
README.md	Rename tier 2 redundancy eliminator to optimizer (#115888 )	2024-02-26 08:42:53 -08:00
stack.py	GH-119866: Spill the stack around escaping calls. (GH-124392)	2024-10-07 14:56:39 +01:00
target_generator.py	GH-120024: Tidy up case generator code a bit. (GH-122780)	2024-08-08 10:57:59 +01:00
tier1_generator.py	gh-118423: Add `INSTRUCTION_SIZE` macro to code generator (GH-125467)	2024-10-29 17:25:05 +00:00
tier2_generator.py	GH-125515: Reduce number of compiler warnings in generated code (GH-125697)	2024-10-28 10:30:31 +00:00
uop_id_generator.py	gh-120417: Remove unused imports in cases_generator (#120622 )	2024-06-17 21:58:56 +02:00
uop_metadata_generator.py	GH-116422: Tier2 hot/cold splitting (GH-116813)	2024-03-26 09:35:11 +00:00

README.md

Tooling to generate interpreters

Documentation for the instruction definitions in Python/bytecodes.c ("the DSL") is here.

What's currently here:

analyzer.py: code for converting AST generated by Parser to more high-level structure for easier interaction
lexer.py: lexer for C, originally written by Mark Shannon
plexer.py: OO interface on top of lexer.py; main class: PLexer
parsing.py: Parser for instruction definition DSL; main class: Parser
parser.py helper for interactions with parsing.py
tierN_generator.py: a couple of driver scripts to read Python/bytecodes.c and write Python/generated_cases.c.h (and several other files)
optimizer_generator.py: reads Python/bytecodes.c and Python/optimizer_bytecodes.c and writes Python/optimizer_cases.c.h
stack.py: code to handle generalized stack effects
cwriter.py: code which understands tokens and how to format C code; main class: CWriter
generators_common.py: helpers for generators
opcode_id_generator.py: generate a list of opcodes and write them to Include/opcode_ids.h
opcode_metadata_generator.py: reads the instruction definitions and write the metadata to Include/internal/pycore_opcode_metadata.h
py_metadata_generator.py: reads the instruction definitions and write the metadata to Lib/_opcode_metadata.py
target_generator.py: generate targets for computed goto dispatch and write them to Python/opcode_targets.h
uop_id_generator.py: generate a list of uop IDs and write them to Include/internal/pycore_uop_ids.h
uop_metadata_generator.py: reads the instruction definitions and write the metadata to Include/internal/pycore_uop_metadata.h

Note that there is some dummy C code at the top and bottom of Python/bytecodes.c to fool text editors like VS Code into believing this is valid C code.

A bit about the parser

The parser class uses a pretty standard recursive descent scheme, but with unlimited backtracking. The PLexer class tokenizes the entire input before parsing starts. We do not run the C preprocessor. Each parsing method returns either an AST node (a Node instance) or None, or raises SyntaxError (showing the error in the C source).

Most parsing methods are decorated with @contextual, which automatically resets the tokenizer input position when None is returned. Parsing methods may also raise SyntaxError, which is irrecoverable. When a parsing method returns None, it is possible that after backtracking a different parsing method returns a valid AST.

Neither the lexer nor the parsers are complete or fully correct. Most known issues are tersely indicated by # TODO: comments. We plan to fix issues as they become relevant.