Previously, we assumed that a union layout always lived on >= 1 64-bit
boundary when generating an LLVM type for it. For small tags unions,
like `[ Ok i8, Err ]` this need not be the case; indeed, a tag union
like that is actually only 2 bits - 1 bit for the "i8" data, and one bit
of the tag kind.
This led to a discrepancy between what the layout IR and generated LLVM
code would assume about the size of tag unions. In the case above, the
layout IR would assume the tag data is 2 bits wide, and the tag id is 1
bit into the data. But the LLVM code would generate a type that was 65
bits wide, the first 64 bits being for the "i8" data and the last 1 bit
being for the tag kind.
Usually, just running the LLVM-emitted code would not present a problem.
But it does present a problem when we use the layout IR to inspect the
result of LLVM-run code, in particular when we try to look up the tag
ID, as the repl does. This patch fixes that issue.
Note that this bug did not present itself in `test_gen` previously
because the data that most tests check against is stored in the front of
the representation.
Closes#2149
This work is related to restricting tag union sizes in input positions.
As an example, for something like
```
\x -> when x is
A M -> X
A N -> X
A _ -> X
```
we'd like to infer `[A [M, N]* ]` rather than the `[A, [M, N]* ]*` we
infer today. Notice the difference is that the former type tells us we
only accepts `A`s, but the argument of the `A` can be `M`, `N` or
anything else (hence the `_`).
So what's the idea? It's an encoding of the "must have"/"might have"
design discussed in https://github.com/rtfeldman/roc/issues/1758. Let's
take our example above and walk through unification of each branch.
Suppose `x` starts off as a flex var `t`.
```
\x -> when x is
A M -> X
```
Now we introduce a new kind of constraint called a "presence"
constraint. It says "t has at least [A [M]]". I'll notate this as `t +=
[A [M]]`. When `t` is free as it is here, this is equivalent to `t ~
[A [M]]`.
```
\x -> when x is
...
A N -> X
```
At this branch we introduce the presence constraint `[A [M]] += [A [N]]`.
Notice that there's two tag unions we care about resolving here - one is
the toplevel one that says "I have an `A ...` inside of me", and the
other one is the tag union that's the tyarg to `A`. They are distinct
and at different depths.
For the toplevel one, we first figure out if the number of tags in the
union needs to expand. It does not - we're hoping to resolve the type
`[A [M, N]]`, which only has `A` in the toplevel union. So, we don't
need to do anything extra there, other than the merge the nested tag
unions.
We recurse on the shared tags, and now we have the presence constraint
`[M] += [N]`. At this point it's important to remember that the left and
right hand types are backed by type variables, so this is really
something like `t11 [M] += t12 [N]`, where `[M]` and `[N]` are just what
we know the variables `t11` and `t12` to be at this moment. So how do we
solve for `t11 [M, N]` from here? Well, we can encode this constraint as
a type variable definition and a unification constraint we already know
how to solve:
```
New definition: t11 [M]a (a fresh)
New constraint: a ~ t12 [N]
```
That's it; upon unification, `t11 [M, N]` falls out.
Okay, last step.
```
\x -> when x is
...
A _ -> X
```
We now have `[A [M, N]] += [A a]`, where `a` is a fresh unbound
variable. Again nothing has to happen on the toplevel. We walk down and
find `t11 [M, N] += t21 a`. This is actually called an "open constraint"; we
differentiate it at the time we generate constraints because it follows
syntactically from the presence of an `_`, but it's semantically
equivalent to the presence constraint `t11 [M, N] += t21 a`. It's just
called opening because literally the only way `t11 [M, N] += t21 a` can
be true is if we set `t11 a`. Well, actually, we assume `a` is a tag
union, so we just make `t11` the open tag union `[M, N]a`. Since `a` is
unbound, this eventually becomes a wildcard and hence falls out `[M, N]*`.
Also, once we open a tag union with an open constraint, we never close
it again.
That's it. The rest falls out recursively. This gives us a really easy
way to encode these ordering constraints in the unification-based system
we have today with minimal additional intervention. We do have to patch
variables in-place sometimes, and the additive nature of these
constraints feels about out-of-place relative to unification, but it
seems to work well.
Resolves#1758
This was referenced in the `List` documentation and in the
[tutorial](./TUTORIAL.md), but wasn't actually implemented prior to this
commit!
Part of #2227