mirror of
https://github.com/roc-lang/roc.git
synced 2025-09-28 22:34:45 +00:00
Start working on new compiler design documentation
This commit is contained in:
parent
24c7bded35
commit
33771ca2cf
2 changed files with 162 additions and 142 deletions
159
crates/compiler/DESIGN.md
Normal file
159
crates/compiler/DESIGN.md
Normal file
|
@ -0,0 +1,159 @@
|
||||||
|
# Compiler Design
|
||||||
|
|
||||||
|
The current Roc compiler is designed as a pipelining compiler parallelizable
|
||||||
|
across Roc modules.
|
||||||
|
|
||||||
|
Roc's compilation pipeline consists of a few major components, which form the
|
||||||
|
table of contents for this document.
|
||||||
|
|
||||||
|
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
|
||||||
|
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
|
||||||
|
|
||||||
|
- [Parsing](#parsing)
|
||||||
|
- [Canonicalization](#canonicalization)
|
||||||
|
- [Symbol Resolution](#symbol-resolution)
|
||||||
|
- [Type-alias normalization](#type-alias-normalization)
|
||||||
|
- [Closure naming](#closure-naming)
|
||||||
|
- [Constraint Generation](#constraint-generation)
|
||||||
|
- [(Mutually-)recursive definitions](#mutually-recursive-definitions)
|
||||||
|
- [Type Solving](#type-solving)
|
||||||
|
- [Unification](#unification)
|
||||||
|
- [Type Inference](#type-inference)
|
||||||
|
- [Recursive Types](#recursive-types)
|
||||||
|
- [Lambda Sets](#lambda-sets)
|
||||||
|
- [Ability Collection](#ability-collection)
|
||||||
|
- [Ability Specialization](#ability-specialization)
|
||||||
|
- [Ability Derivation](#ability-derivation)
|
||||||
|
- [Exhaustiveness Checking](#exhaustiveness-checking)
|
||||||
|
- [Debugging](#debugging)
|
||||||
|
- [IR Generation](#ir-generation)
|
||||||
|
- [Memory Layouts](#memory-layouts)
|
||||||
|
- [Compiling Calls](#compiling-calls)
|
||||||
|
- [Decision Trees](#decision-trees)
|
||||||
|
- [Tail-call Optimization](#tail-call-optimization)
|
||||||
|
- [Reference-count insertion](#reference-count-insertion)
|
||||||
|
- [Reusing Memory Allocations](#reusing-memory-allocations)
|
||||||
|
- [Debugging](#debugging-1)
|
||||||
|
- [LLVM Code Generator](#llvm-code-generator)
|
||||||
|
- [Morphic Analysis](#morphic-analysis)
|
||||||
|
- [C ABI](#c-abi)
|
||||||
|
- [Test Harness](#test-harness)
|
||||||
|
- [Debugging](#debugging-2)
|
||||||
|
- [WASM Code Generator](#wasm-code-generator)
|
||||||
|
- [WASM Interpreter](#wasm-interpreter)
|
||||||
|
- [Debugging](#debugging-3)
|
||||||
|
- [Dev Code Generator](#dev-code-generator)
|
||||||
|
- [Debugging](#debugging-4)
|
||||||
|
- [Builtins](#builtins)
|
||||||
|
- [Compiler Driver](#compiler-driver)
|
||||||
|
- [Caching types](#caching-types)
|
||||||
|
- [Repl](#repl)
|
||||||
|
- [`test` and `dbg`](#test-and-dbg)
|
||||||
|
- [Formatter](#formatter)
|
||||||
|
- [Glue](#glue)
|
||||||
|
- [Active areas of research / help wanted](#active-areas-of-research--help-wanted)
|
||||||
|
|
||||||
|
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
|
||||||
|
|
||||||
|
## Parsing
|
||||||
|
|
||||||
|
Roc's parsers are designed as [combinators](https://en.wikipedia.org/wiki/Parser_combinator).
|
||||||
|
A list of Roc's parse AST and combinators can be found in [the root parse
|
||||||
|
file](./parse/src/parser.rs).
|
||||||
|
|
||||||
|
Combinators enable parsing to compose as functions would - for example, the
|
||||||
|
`one_of` combinator supports attempting multiple parsing strategies, and
|
||||||
|
succeeding on the first one; the `and_then` combinator chains two parsers
|
||||||
|
together, failing if either parser in the sequence fails.
|
||||||
|
|
||||||
|
Since Roc is an indentation-sensitive language, parsing must be cognizant and
|
||||||
|
deligent about handling indentation and de-indentation levels. Most parsing
|
||||||
|
functions take a `min_indent` parameter that specifies the minimum indentation
|
||||||
|
of the scope an expression should be parsed in. Generally, failing to reach
|
||||||
|
`min_indent` indicates that an expression has ended (but perhaps too early).
|
||||||
|
|
||||||
|
## Canonicalization
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
### Symbol Resolution
|
||||||
|
|
||||||
|
### Type-alias normalization
|
||||||
|
|
||||||
|
### Closure naming
|
||||||
|
|
||||||
|
## Constraint Generation
|
||||||
|
|
||||||
|
### (Mutually-)recursive definitions
|
||||||
|
|
||||||
|
## Type Solving
|
||||||
|
|
||||||
|
### Unification
|
||||||
|
|
||||||
|
### Type Inference
|
||||||
|
|
||||||
|
### Recursive Types
|
||||||
|
|
||||||
|
### Lambda Sets
|
||||||
|
|
||||||
|
### Ability Collection
|
||||||
|
|
||||||
|
### Ability Specialization
|
||||||
|
|
||||||
|
### Ability Derivation
|
||||||
|
|
||||||
|
### Exhaustiveness Checking
|
||||||
|
|
||||||
|
### Debugging
|
||||||
|
|
||||||
|
## IR Generation
|
||||||
|
|
||||||
|
### Memory Layouts
|
||||||
|
|
||||||
|
### Compiling Calls
|
||||||
|
|
||||||
|
### Decision Trees
|
||||||
|
|
||||||
|
### Tail-call Optimization
|
||||||
|
|
||||||
|
### Reference-count insertion
|
||||||
|
|
||||||
|
### Reusing Memory Allocations
|
||||||
|
|
||||||
|
### Debugging
|
||||||
|
|
||||||
|
## LLVM Code Generator
|
||||||
|
|
||||||
|
### Morphic Analysis
|
||||||
|
|
||||||
|
### C ABI
|
||||||
|
|
||||||
|
### Test Harness
|
||||||
|
|
||||||
|
### Debugging
|
||||||
|
|
||||||
|
## WASM Code Generator
|
||||||
|
|
||||||
|
### WASM Interpreter
|
||||||
|
|
||||||
|
### Debugging
|
||||||
|
|
||||||
|
## Dev Code Generator
|
||||||
|
|
||||||
|
### Debugging
|
||||||
|
|
||||||
|
## Builtins
|
||||||
|
|
||||||
|
## Compiler Driver
|
||||||
|
|
||||||
|
### Caching types
|
||||||
|
|
||||||
|
## Repl
|
||||||
|
|
||||||
|
## `test` and `dbg`
|
||||||
|
|
||||||
|
## Formatter
|
||||||
|
|
||||||
|
## Glue
|
||||||
|
|
||||||
|
## Active areas of research / help wanted
|
|
@ -1,147 +1,8 @@
|
||||||
# The Roc Compiler
|
# The Roc Compiler
|
||||||
|
|
||||||
Here's how the compiler is laid out.
|
For an overview of the design and architecture of the compiler, see
|
||||||
|
[DESIGN.md](./DESIGN.md). If you want to dive into the
|
||||||
## Parsing
|
implementation or get some tips on debugging the compiler, see below
|
||||||
|
|
||||||
The main goal of parsing is to take a plain old String (such as the contents a .roc source file read from the filesystem) and translate that String into an `Expr` value.
|
|
||||||
|
|
||||||
`Expr` is an `enum` defined in the `expr` module. An `Expr` represents a Roc expression.
|
|
||||||
|
|
||||||
For example, parsing would translate this string...
|
|
||||||
|
|
||||||
"1 + 2"
|
|
||||||
|
|
||||||
...into this `Expr` value:
|
|
||||||
|
|
||||||
BinOp(Int(1), Plus, Int(2))
|
|
||||||
|
|
||||||
> Technically it would be `Box::new(Int(1))` and `Box::new(Int(2))`, but that's beside the point for now.
|
|
||||||
|
|
||||||
This `Expr` representation of the expression is useful for things like:
|
|
||||||
|
|
||||||
- Checking that all variables are declared before they're used
|
|
||||||
- Type checking
|
|
||||||
|
|
||||||
> As of this writing, the compiler doesn't do any of those things yet. They'll be added later!
|
|
||||||
|
|
||||||
Since the parser is only concerned with translating String values into Expr values, it will happily translate syntactically valid strings into expressions that won't work at runtime.
|
|
||||||
|
|
||||||
For example, parsing will translate this string:
|
|
||||||
|
|
||||||
not "foo", "bar"
|
|
||||||
|
|
||||||
...into this `Expr`:
|
|
||||||
|
|
||||||
CallByName("not", vec!["foo", "bar"])
|
|
||||||
|
|
||||||
Now we may know that `not` takes a `Bool` and returns another `Bool`, but the parser doesn't know that.
|
|
||||||
|
|
||||||
The parser only knows how to translate a `String` into an `Expr`; it's the job of other parts of the compiler to figure out if `Expr` values have problems like type mismatches and non-exhaustive patterns.
|
|
||||||
|
|
||||||
That said, the parser can still run into syntax errors. This won't parse:
|
|
||||||
|
|
||||||
if then 5 then else then
|
|
||||||
|
|
||||||
This is gibberish to the parser, so it will produce an error rather than an `Expr`.
|
|
||||||
|
|
||||||
Roc's parser is implemented using the [`marwes/combine`](http://github.com/marwes/combine-language/) crate.
|
|
||||||
|
|
||||||
## Evaluating
|
|
||||||
|
|
||||||
One of the useful things we can do with an `Expr` is to evaluate it.
|
|
||||||
|
|
||||||
The process of evaluation is basically to transform an `Expr` into the simplest `Expr` we can that's still equivalent to the original.
|
|
||||||
|
|
||||||
For example, let's say we had this code:
|
|
||||||
|
|
||||||
"1 + 8 - 3"
|
|
||||||
|
|
||||||
The parser will translate this into the following `Expr`:
|
|
||||||
|
|
||||||
BinOp(
|
|
||||||
Int(1),
|
|
||||||
Plus,
|
|
||||||
BinOp(Int(8), Minus, Int(3))
|
|
||||||
)
|
|
||||||
|
|
||||||
The `eval` function will take this `Expr` and translate it into this much simpler `Expr`:
|
|
||||||
|
|
||||||
Int(6)
|
|
||||||
|
|
||||||
At this point it's become so simple that we can display it to the end user as the number `6`. So running `parse` and then `eval` on the original Roc string of `1 + 8 - 3` will result in displaying `6` as the final output.
|
|
||||||
|
|
||||||
> The `expr` module includes an `impl fmt::Display for Expr` that takes care of translating `Int(6)` into `6`, `Char('x')` as `'x'`, and so on.
|
|
||||||
|
|
||||||
`eval` accomplishes this by doing a `match` on an `Expr` and resolving every operation it encounters. For example, when it first sees this:
|
|
||||||
|
|
||||||
BinOp(
|
|
||||||
Int(1),
|
|
||||||
Plus,
|
|
||||||
BinOp(Int(8), Minus, Int(3))
|
|
||||||
)
|
|
||||||
|
|
||||||
The first thing it does is to call `eval` on the right `Expr` values on either side of the `Plus`. That results in:
|
|
||||||
|
|
||||||
1. Calling `eval` on `Int(1)`, which returns `Int(1)` since it can't be reduced any further.
|
|
||||||
2. Calling `eval` on `BinOp(Int(8), Minus, Int(3))`, which in fact can be reduced further.
|
|
||||||
|
|
||||||
Since the second call to `eval` will match on another `BinOp`, it's once again going to recursively call `eval` on both of its `Expr` values. Since those are both `Int` values, though, their `eval` calls will return them right away without doing anything else.
|
|
||||||
|
|
||||||
Now that it's evaluated the expressions on either side of the `Minus`, `eval` will look at the particular operator being applied to those expressions (in this case, a minus operator) and check to see if the expressions it was given work with that operation.
|
|
||||||
|
|
||||||
> Remember, this `Expr` value potentially came directly from the parser. `eval` can't be sure any type checking has been done on it!
|
|
||||||
|
|
||||||
If `eval` detects a non-numeric `Expr` value (that is, the `Expr` is not `Int` or `Frac`) on either side of the `Minus`, then it will immediately give an error and halt the evaluation. This sort of runtime type error is common to dynamic languages, and you can think of `eval` as being a dynamic evaluation of Roc code that hasn't necessarily been type-checked.
|
|
||||||
|
|
||||||
Assuming there's no type problem, `eval` can go ahead and run the Rust code of `8 - 3` and store the result in an `Int` expr.
|
|
||||||
|
|
||||||
That concludes our original recursive call to `eval`, after which point we'll be evaluating this expression:
|
|
||||||
|
|
||||||
BinOp(
|
|
||||||
Int(1),
|
|
||||||
Plus,
|
|
||||||
Int(5)
|
|
||||||
)
|
|
||||||
|
|
||||||
This will work the same way as `Minus` did, and will reduce down to `Int(6)`.
|
|
||||||
|
|
||||||
## Optimization philosophy
|
|
||||||
|
|
||||||
Focus on optimizations which are only safe in the absence of side effects, and leave the rest to LLVM.
|
|
||||||
|
|
||||||
This focus may lead to some optimizations becoming transitively in scope. For example, some deforestation
|
|
||||||
examples in the MSR paper benefit from multiple rounds of interleaved deforestation, beta-reduction, and inlining.
|
|
||||||
To get those benefits, we'd have to do some inlining and beta-reduction that we could otherwise leave to LLVM's
|
|
||||||
inlining and constant propagation/folding.
|
|
||||||
|
|
||||||
Even if we're doing those things, it may still make sense to have LLVM do a pass for them as well, since
|
|
||||||
early LLVM optimization passes may unlock later opportunities for inlining and constant propagation/folding.
|
|
||||||
|
|
||||||
## Inlining
|
|
||||||
|
|
||||||
If a function is called exactly once (it's a helper function), presumably we always want to inline those.
|
|
||||||
If a function is "small enough" it's probably worth inlining too.
|
|
||||||
|
|
||||||
## Fusion
|
|
||||||
|
|
||||||
<https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/deforestation-short-cut.pdf>
|
|
||||||
|
|
||||||
Basic approach:
|
|
||||||
|
|
||||||
Do list stuff using `build` passing Cons Nil (like a cons list) and then do foldr/build substitution/reduction.
|
|
||||||
Afterwards, we can do a separate pass to flatten nested Cons structures into properly initialized RRBTs.
|
|
||||||
This way we get both deforestation and efficient RRBT construction. Should work for the other collection types too.
|
|
||||||
|
|
||||||
It looks like we need to do some amount of inlining and beta reductions on the Roc side, rather than
|
|
||||||
leaving all of those to LLVM.
|
|
||||||
|
|
||||||
Advanced approach:
|
|
||||||
|
|
||||||
Express operations like map and filter in terms of toStream and fromStream, to unlock more deforestation.
|
|
||||||
More info on here:
|
|
||||||
|
|
||||||
<https://wiki.haskell.org/GHC_optimisations#Fusion>
|
|
||||||
|
|
||||||
## Getting started with the code
|
## Getting started with the code
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue