erg/doc/EN/compiler/architecture.md
2023-06-19 11:28:38 +09:00

4 KiB

Architecture of ergc

1. Scan an Erg script (.er) and generate a TokenStream

src: erg_parser/lex.rs

  • parser/lexer/Lexer generates TokenStream (this is an iterator of Token, TokenStream can be generated by Lexer::collect())
    • Lexer is constructed from Lexer::new or Lexer::from_str, where Lexer::new reads the code from a file or command option.
    • Lexer can generate tokens sequentially as an iterator; if you want to get a TokenStream all at once, use Lexer::lex.
    • Lexer outputs LexErrors as errors, but LexError does not have enough information to display itself. If you want to display the error, use the LexerRunner to convert the error.
    • LexerRunner can also be used if you want to use Lexer as standalone; Lexer is just an iterator and does not implement the Runnable trait.
      • Runnable is implemented by LexerRunner, ParserRunner, Compiler, and DummyVM.

2. Convert TokenStream -> AST

src: erg_parser/parse.rs

  • Parser, like Lexer, has two constructors, Parser::new and Parser::from_str, and Parser::parse will give the AST.
  • AST is the wrapper type of Vec<Expr>. It is for "Abstract Syntax Tree".

2.1 Desugaring AST

src: erg_parser/desugar.rs

  • expand nested vars (Desugarer::desugar_nest_vars_pattern)
  • desugar multiple pattern definition syntax (Desugarer::desugar_multiple_pattern_def)

2.2 Reordering & Linking AST

src: erg_compiler/link_ast.rs

  • link class methods to class definitions
    • method definitions are allowed outside of the class definition file
    • current implementation is incomplete, only in the same file

3. Convert AST -> HIR

(main) src: erg_compiler/lower.rs

3.1 Name Resolution

In the current implementation it is done during type checking.

  • All ASTs (including imported modules) are scanned for name resolution before type inference.
  • In addition to performing cycle checking and reordering, a context is created for type inference (however, most of the information on variables registered in this context is not yet finalized).

3.2 import resolution

  • When import is called, a new thread is created for analysis.
  • JoinHandle is stored in SharedCompilerResource and is joined when the module is needed.
  • Unused modules may not be joined, but currently all such modules are also analyzed.

3.3 Type checking & inference

src: erg_compiler/lower.rs

  • HIR has every variable's type information. It is for "High-level Intermediate Representation".
  • ASTLowerer can be constructed in the same way as Parser and Lexer.
  • ASTLowerer::lower will output either CompleteArtifact or IncompleteArtifact. Both has HIR and LowerWarnings. IncompleteArtifact also has LowerErrors.
  • ASTLowerer is owned by Compiler. Unlike other structures (Lexer, Parser, etc.), ASTLowerer handles code contexts and is not a one-time disposable.
  • For type inference algorithms, see inference.md.

4. Check side-effects

src: erg_compiler/effectcheck.rs

5. Check ownerships

src: erg_compiler/ownercheck.rs

6. Optimize HIR

src: erg_compiler/optimize.rs

  • Eliminate dead code (unused variables, imports, etc.)

src: erg_compiler/link_hir.rs

  • Load all modules, resolve dependencies, and combine into a single HIR

8. Desugar HIR

src: erg_compiler/desugar_hir.rs

Convert parts that are not consistent with Python syntax

  • Convert class member variables to functions

8. Generate Bytecode (CodeObj) from HIR

src: erg_compiler/codegen.rs