Presently while generalizing type variables, we check variables
introduced at a scope for redundancy (whether they are not the root of
some unified set of variables). If a variable is redundant, its rank is
not adjusted. I believe the current logic to be the following:
- Each root of a unification tree will be introduced at some point,
exactly once. Its point of introduction will determine the rank of the
tree it's the root of
- If a variable is redundant, all of its redundant usages must be at the
same rank (assuming let generalization proceeds correctly),
so there is no need to adjust their rank as well
- As such, there is no need to adjust the rank of redundant variables,
as a performance optimization.
I believe this to be a hold-over from the original version of the solver
derived from the elm-compiler.
In our implementation however rank adjustment is very cheap (thanks to
SoA, ranks are likely in the cache lines already anyway because we just
adjusted variables at this point).
However, there is a larger problem here - ranks must be adjusted for
redundant variables as we begin to support weakened type variables.
The motivating case is
```
\x -> when x is
_x -> Green
```
we would like this code generalized as `* -> [Green]*`. `when`
expressions have each branch solved via let-bindings; in particular, for
the singleton branch we introduce `_x` of the appropriate type and solve
the body as `[Green]*`.
Today, `[Green]*` would be generalized in the context of the inner scope
that binds `_x`, which means it is generalized in the body `\x -> ...`
as a whole.
However, with weakening, we do not want this behavior! In particular, we
do not want to actually generalize `_x` in the context of the branch
body. Doing so means you could write things like
```
main = \{} -> when Red is
x ->
y : [Red]
y = x
z : [Red, Green]
z = x
{y, z}
```
which is exactly the kind of spurious generalization that the weakening
design is trying to avoid.
So, we want to introduce `[Green]*` at the rank of the body `\x -> ...`;
let's call this `rank_body`, and let's say `[Green]*` is introduced as
`branch_var`. Let's say the return type variable is `ret_var`.
Now we must be careful. If after unification `ret_var ~ branch_var` we have that
`branch_var` becomes the root, then despite `ret_var` (and `branch_var`) being at
`rank_body` (which is also the rank that will promoted to generalization),
the tree given by `branch_var` won't be generalized, because `ret_var` will be
seen as redundant! In fact it is, because `branch_var` was introdued
previously, but that doesn't matter - we want the variable to be
generalized at the level of the outer let-binding `main = \{} -> ...`.
This problem is not unique to when-branches; for example we can observe
the same symptom with
```
main = \{} ->
x = Green
x
```
where here we'd like `x` to not be generalized inside the body of
`main`, but have it be generalized relative to the body of `main` (that
is, main should have signature `{} -> [Green]*`, but you cannot use `x`
itself polymorphically inside the body of `main`).
As such, the easiest solution as far as I can see, in the presence of
weakening, is to allow rank-adjustment and generalization of redundant
variables if they are permitted to be generalized relative to a lower
scope.
This should preserve soundness; the main source of unsoundness in
rank-based let generalization is making sure something like
```
\x ->
y = \z -> x z
y
```
has type `(a -> b) -> (a -> b)` and not e.g. `(a -> b) -> (c -> d)` due
to `x` being instantiated at a higher rank in `y = ...` than it
actually is. Note that this change cannot affect this case at all, since
we are still doing the rank-adjustment pass at higher ranks, unifying
lowers ranked variables to their minimum relative rank, and introduction
only happens in the lower-ranked scopes.
The `LambdaSet` struct is frequently used independently to examine how a
lambda set should be packed or unpacked. However, it is also often
converted into a full layout via `Layout::LambdaSet(LambdaSet)` to be a
part of function arguments, for example.
In preparing to intern all layouts, we need a way to cheaply go from a
`lambda_set` to an interned `Layout::LambdaSet(lambda_set)`, since this
is a very common operation. The proposed solution is to keep the wrapped
layout cached on `LambdaSet` itself, which this PR does.
The tricky bit of inserting a lambda set is we need to fill in the
interned `full_layout` only after the lambda set is inserted,
but we don't want to allocate a new interned slot if the same lambda set
layout has already been inserted with a different `full_layout` slot.
For example, if we insert `LambdaSet { set : [A] }` twice in two
different threads, we want the `full_layout` they map to to be the same.
So we nede to check if an interned representation with a full_layout
exists, before we allocate a new full_layout and insert a fresh lambda
set.
So,
- check if the "normalized" lambda set (with a void full_layout slot) maps to an
inserted lambda set in
- in a thread-local cache, or globally
- if so, use that one immediately
- otherwise, allocate a new (global) slot, intern the lambda set, and then fill the slot in
- save the interned layout and lambda set mapping thread-locally
* Unify parsing of string literals and scalar literals, to (e.g.) ensure escapes are handled uniformly. Notably, this makes unicode escapes valid in scalar literals.
* Add a variety of custom error messages about specific failure cases of parsing string/scalar literals. For example, if we're expecting a string (e.g. a package name in the header) and the user tried using single quotes, give a clear message about that.
* Fix formatting of unicode escapes (they previously used {}, now correctly use () to match roc strings)