mirror of
https://github.com/erg-lang/erg.git
synced 2025-09-28 20:14:45 +00:00
2.7 KiB
2.7 KiB
Architecture of ergc
1. Scan an Erg script (.er) and generate a TokenStream
(parser/lex.rs)
- parser/lexer/Lexer generates
TokenStream
(this is an iterator of Token, TokenStream can be generated by lexer.collect())Lexer
is constructed fromLexer::new
orLexer::from_str
, whereLexer::new
reads the code from a file or command option.Lexer
can generate tokens sequentially as an iterator; if you want to get aTokenStream
all at once, useLexer::lex
.Lexer
outputsLexError
s as errors, butLexError
does not have enough information to display itself. If you want to display the error, use theLexerRunner
to convert the error.LexerRunner
can also be used if you want to useLexer
as standalone;Lexer
is just an iterator and does not implement theRunnable
trait.Runnable
is implemented byLexerRunner
,ParserRunner
,Compiler
, andVirtualMachine
.
2. Convert TokenStream
-> AST
(parser/parse.rs)
Parser
, likeLexer
, has two constructors,Parser::new
andParser::from_str
, andParser::parse
will give theAST
.AST
is the wrapper type ofVec<Expr>
. It is for "Abstract Syntax Tree".
2.5 Desugaring AST
- expand nested vars (
Desugarer::desugar_nest_vars_pattern
) - desugar multiple pattern definition syntax (
Desugarer::desugar_multiple_pattern_def
)
3. Type checking & inference, Convert AST
-> HIR
(compiler/lower.rs)
HIR
has every variable's type information. It is for "High-level Intermediate Representation".HIR
only holds the type of the variable, but that's enough. In extreme cases, this is because Erg has only conversion (or operator) applications. If we know the type of the conversion, we have already know the type of the object of the argument.ASTLowerer
can be constructed in the same way asParser
andLexer
.ASTLowerer::lower
will output a tuple ofHIR
andCompileWarnings
if no errors occur.ASTLowerer
is owned byCompiler
. Unlike conventional structures,ASTLowerer
handles code contexts and is not a one-time disposable.- If the result of type inference is incomplete (if there is an unknown type variable), an error will occur during name resolution.
4. Check side-effects (compiler/effectcheck.rs)
4. Check ownerships (compiler/memcheck.rs)
5. Generate Bytecode (CodeObj
) from HIR
(compiler/codegen.rs)
- From the type information of the expression, name resolution of the quantified subroutines will be performed.
(6. (Future plans) Convert Bytecode -> LLVM IR)
- Bytecode is stack-based, whereas LLVM IR is register-based. There will be several more layers of intermediate processes for this conversion process.