cpython/Tools/cases_generator
Guido van Rossum 70185de1ab
gh-102654: Insert #line directives in generated_cases.c.h (#102669)
This behavior is optional, because in some extreme cases it
may just make debugging harder. The tool defaults it to off,
but it is on in Makefile.pre.in.

Also note that this makes diffs to generated_cases.c.h noisier,
since whenever you insert or delete a line in bytecodes.c,
all subsequent #line directives will change.
2023-03-15 08:37:36 -07:00
..
generate_cases.py gh-102654: Insert #line directives in generated_cases.c.h (#102669) 2023-03-15 08:37:36 -07:00
interpreter_definition.md gh-98831: Move DSL documentation here from ideas repo (#101629) 2023-02-06 21:03:58 -08:00
lexer.py gh-102021 : Allow multiple input files for interpreter loop generator (#102022) 2023-03-03 20:59:21 -08:00
parser.py gh-102021 : Allow multiple input files for interpreter loop generator (#102022) 2023-03-03 20:59:21 -08:00
plexer.py GH-98831: Refactor and fix cases generator (#99526) 2022-11-17 17:06:07 -08:00
README.md gh-98831: Move DSL documentation here from ideas repo (#101629) 2023-02-06 21:03:58 -08:00
test_generator.py gh-98831: Modernize CALL_FUNCTION_EX (#101627) 2023-02-07 20:03:22 -08:00

Tooling to generate interpreters

Documentation for the instruction definitions in Python/bytecodes.c ("the DSL") is here.

What's currently here:

  • lexer.py: lexer for C, originally written by Mark Shannon
  • plexer.py: OO interface on top of lexer.py; main class: PLexer
  • parser.py: Parser for instruction definition DSL; main class Parser
  • generate_cases.py: driver script to read Python/bytecodes.c and write Python/generated_cases.c.h
  • test_generator.py: tests, require manual running using pytest

Note that there is some dummy C code at the top and bottom of Python/bytecodes.c to fool text editors like VS Code into believing this is valid C code.

A bit about the parser

The parser class uses a pretty standard recursive descent scheme, but with unlimited backtracking. The PLexer class tokenizes the entire input before parsing starts. We do not run the C preprocessor. Each parsing method returns either an AST node (a Node instance) or None, or raises SyntaxError (showing the error in the C source).

Most parsing methods are decorated with @contextual, which automatically resets the tokenizer input position when None is returned. Parsing methods may also raise SyntaxError, which is irrecoverable. When a parsing method returns None, it is possible that after backtracking a different parsing method returns a valid AST.

Neither the lexer nor the parsers are complete or fully correct. Most known issues are tersely indicated by # TODO: comments. We plan to fix issues as they become relevant.