mirror of
https://github.com/erg-lang/erg.git
synced 2025-09-29 12:24:45 +00:00
2.3 KiB
2.3 KiB
Architecture of ergc
1. Scan an Erg script (.er) and generate a TokenStream
(parser/lex.rs)
- parser/lexer/Lexer generates
TokenStream
(this is an iterator ofToken
,TokenStream
can be generated byLexer::collect()
)Lexer
is constructed fromLexer::new
orLexer::from_str
, whereLexer::new
reads the code from a file or command option.Lexer
can generate tokens sequentially as an iterator; if you want to get aTokenStream
all at once, useLexer::lex
.Lexer
outputsLexError
s as errors, butLexError
does not have enough information to display itself. If you want to display the error, use theLexerRunner
to convert the error.LexerRunner
can also be used if you want to useLexer
as standalone;Lexer
is just an iterator and does not implement theRunnable
trait.Runnable
is implemented byLexerRunner
,ParserRunner
,Compiler
, andDummyVM
.
2. Convert TokenStream
-> AST
(parser/parse.rs)
Parser
, likeLexer
, has two constructors,Parser::new
andParser::from_str
, andParser::parse
will give theAST
.AST
is the wrapper type ofVec<Expr>
. It is for "Abstract Syntax Tree".
2.5 Desugaring AST
- expand nested vars (
Desugarer::desugar_nest_vars_pattern
) - desugar multiple pattern definition syntax (
Desugarer::desugar_multiple_pattern_def
)
3. Type checking & inference, Convert AST
-> HIR
(compiler/lower.rs)
HIR
has every variable's type information. It is for "High-level Intermediate Representation".ASTLowerer
can be constructed in the same way asParser
andLexer
.ASTLowerer::lower
will output a tuple ofHIR
andCompileWarnings
if no errors occur.ASTLowerer
is owned byCompiler
. Unlike conventional structures,ASTLowerer
handles code contexts and is not a one-time disposable.- If the result of type inference is incomplete (if there is an unknown type variable), an error will occur during name resolution.
4. Check side-effects (compiler/effectcheck.rs)
4. Check ownerships (compiler/memcheck.rs)
5. Generate Bytecode (CodeObj
) from HIR
(compiler/codegen.rs)
(6. (Future plans) Convert Bytecode -> LLVM IR)
- Bytecode is stack-based, whereas LLVM IR is register-based. There will be several more layers of intermediate processes for this conversion process.