erg/doc/EN/compiler/phases/10_codegen.md
2024-05-20 22:45:42 +09:00

3.8 KiB

Code Generation

By default, Erg scripts are converted to pyc files and executed. In other words, they are executed as Python bytecode rather than Python scripts. The pyc files are generated from the HIR, which has been desugared (phase 8) and linked with dependencies (phase 9). The process is handled by the PyCodeGenerator. This structure takes HIR and returns a CodeObj. The CodeObj corresponds to Python's Code object and contains the sequence of instructions to be executed, objects in the static area, and various other metadata. From the perspective of the Python interpreter, the Code object represents a scope. The Code representing the top-level scope will contain all the information necessary for execution. The CodeObj is serialized into a binary format using the dump_as_pyc method and written to a pyc file.

Features Not Present in Python

Erg Runtime

Erg runs on the Python interpreter, but there are various semantic differences from Python. Some features are implemented by the compiler desugaring them into lower-level features, but some can only be implemented at runtime.

Examples include methods that do not exist in Python's built-in types. Python's built-ins do not have a Nat type, nor do they have a times! method. These methods are implemented by creating new types that wrap Python's built-in types.

These types are located here. The generated bytecode first imports _erg_std_prelude.py. This module re-exports the types and functions provided by the Erg runtime.

Record

Records are implemented using Python's namedtuple.

Trait

Traits are implemented as Python's ABC (Abstract Base Classes). However, Erg's traits have little meaning at runtime.

match

Pattern matching is mostly reduced to a combination of type checks and assignment operations. This is done relatively early in the compilation process.

i, [j, *k] = 1, [2, 3, 4]

_0 = 1, [2, 3]
i = _0[0]
_1 = _0[1]
j = _1[0]
k = _1[1:]

However, some are delayed until runtime.

x: Int or Str
match x:
    i: Int -> ...
    s: Str -> ...

This pattern match requires a runtime check. This check is performed by in_operator.

Therefore, the desugared code for the above example is as follows. Exhaustiveness checking is performed at compile time.

if in_operator(x, Int):
    ...
else:
    ...

Control-flow Functions

Functions corresponding to Python control-flows such as for! and if! change entity depending on their optimization status. Usually, optimization can be performed and they are reduced to dedicated bytecode instructions.

for! [a, b], i =>
    ...

LOAD_NAME 0(a)
LOAD_NAME 1(b)
BUILD_LIST 2
GET_ITER
FOR_ITER ...
STORE_NAME 2(i)
...

This is more efficient than function calls. However, there are cases where optimization cannot be performed, as shown below.

f! = [for!, ...].choice!()

f! [1, 2], i =>
    ...

Such cases must be treated as functions with entities. Functions are defined in _erg_control.py.