Presence constraints for tag union types

This work is related to restricting tag union sizes in input positions. As an example, for something like ``` \x -> when x is A M -> X A N -> X A _ -> X ``` we'd like to infer `[A [M, N]* ]` rather than the `[A, [M, N]* ]*` we infer today. Notice the difference is that the former type tells us we only accepts `A`s, but the argument of the `A` can be `M`, `N` or anything else (hence the `_`). So what's the idea? It's an encoding of the "must have"/"might have" design discussed in https://github.com/rtfeldman/roc/issues/1758. Let's take our example above and walk through unification of each branch. Suppose `x` starts off as a flex var `t`. ``` \x -> when x is A M -> X ``` Now we introduce a new kind of constraint called a "presence" constraint. It says "t has at least [A [M]]". I'll notate this as `t += [A [M]]`. When `t` is free as it is here, this is equivalent to `t ~ [A [M]]`. ``` \x -> when x is ... A N -> X ``` At this branch we introduce the presence constraint `[A [M]] += [A [N]]`. Notice that there's two tag unions we care about resolving here - one is the toplevel one that says "I have an `A ...` inside of me", and the other one is the tag union that's the tyarg to `A`. They are distinct and at different depths. For the toplevel one, we first figure out if the number of tags in the union needs to expand. It does not - we're hoping to resolve the type `[A [M, N]]`, which only has `A` in the toplevel union. So, we don't need to do anything extra there, other than the merge the nested tag unions. We recurse on the shared tags, and now we have the presence constraint `[M] += [N]`. At this point it's important to remember that the left and right hand types are backed by type variables, so this is really something like `t11 [M] += t12 [N]`, where `[M]` and `[N]` are just what we know the variables `t11` and `t12` to be at this moment. So how do we solve for `t11 [M, N]` from here? Well, we can encode this constraint as a type variable definition and a unification constraint we already know how to solve: ``` New definition: t11 [M]a (a fresh) New constraint: a ~ t12 [N] ``` That's it; upon unification, `t11 [M, N]` falls out. Okay, last step. ``` \x -> when x is ... A _ -> X ``` We now have `[A [M, N]] += [A a]`, where `a` is a fresh unbound variable. Again nothing has to happen on the toplevel. We walk down and find `t11 [M, N] += t21 a`. This is actually called an "open constraint"; we differentiate it at the time we generate constraints because it follows syntactically from the presence of an `_`, but it's semantically equivalent to the presence constraint `t11 [M, N] += t21 a`. It's just called opening because literally the only way `t11 [M, N] += t21 a` can be true is if we set `t11 a`. Well, actually, we assume `a` is a tag union, so we just make `t11` the open tag union `[M, N]a`. Since `a` is unbound, this eventually becomes a wildcard and hence falls out `[M, N]*`. Also, once we open a tag union with an open constraint, we never close it again. That's it. The rest falls out recursively. This gives us a really easy way to encode these ordering constraints in the unification-based system we have today with minimal additional intervention. We do have to patch variables in-place sometimes, and the additive nature of these constraints feels about out-of-place relative to unification, but it seems to work well. Resolves #1758
2025-09-28 14:24:45 +00:00 · 2021-12-20 18:58:51 -06:00 · 2021-12-20 18:58:51 -06:00 · b97ff380e3
commit b97ff380e3
parent b87d091660
9 changed files with 512 additions and 93 deletions
--- a/compiler/solve/src/solve.rs
+++ b/compiler/solve/src/solve.rs
@ -1,4 +1,5 @@
 use roc_can::constraint::Constraint::{self, *};
+use roc_can::constraint::PresenceConstraint;
 use roc_can::expected::{Expected, PExpected};
 use roc_collections::all::MutMap;
 use roc_module::ident::TagName;
@ -205,7 +206,7 @@ fn solve(
                expectation.get_type_ref(),
            );

-            match unify(subs, actual, expected) {
+            match unify(subs, actual, expected, false) {
                Success(vars) => {
                    introduce(subs, rank, pools, &vars);

@ -240,7 +241,7 @@ fn solve(
            let actual = type_to_var(subs, rank, pools, cached_aliases, source);
            let target = *target;

-            match unify(subs, actual, target) {
+            match unify(subs, actual, target, false) {
                Success(vars) => {
                    introduce(subs, rank, pools, &vars);

@ -294,7 +295,7 @@ fn solve(
                        cached_aliases,
                        expectation.get_type_ref(),
                    );
-                    match unify(subs, actual, expected) {
+                    match unify(subs, actual, expected, false) {
                        Success(vars) => {
                            introduce(subs, rank, pools, &vars);

@ -349,7 +350,7 @@ fn solve(

            state
        }
-        Pattern(region, category, typ, expectation) => {
+        Pattern(region, category, typ, expectation, presence) => {
            let actual = type_to_var(subs, rank, pools, cached_aliases, typ);
            let expected = type_to_var(
                subs,
@ -359,7 +360,7 @@ fn solve(
                expectation.get_type_ref(),
            );

-            match unify(subs, actual, expected) {
+            match unify(subs, actual, expected, *presence) {
                Success(vars) => {
                    introduce(subs, rank, pools, &vars);

@ -622,6 +623,69 @@ fn solve(
                }
            }
        }
+        Present(typ, constr) => {
+            let actual = type_to_var(subs, rank, pools, cached_aliases, typ);
+            match constr {
+                PresenceConstraint::IsOpen => {
+                    let mut new_desc = subs.get(actual);
+                    match new_desc.content {
+                        Content::Structure(FlatType::TagUnion(tags, _)) => {
+                            let new_ext = subs.fresh_unnamed_flex_var();
+                            let new_union = Content::Structure(FlatType::TagUnion(tags, new_ext));
+                            new_desc.content = new_union;
+                            subs.set(actual, new_desc);
+                            state
+                        }
+                        _ => {
+                            // Today, an "open" constraint doesn't affect any types
+                            // other than tag unions. Recursive tag unions are constructed
+                            // at a later time (during occurs checks after tag unions are
+                            // resolved), so that's not handled here either.
+                            // NB: Handle record types here if we add presence constraints
+                            // to their type inference as well.
+                            state
+                        }
+                    }
+                }
+                PresenceConstraint::IncludesTag(tag_name, tys) => {
+                    let tag_ty = Type::TagUnion(
+                        vec![(tag_name.clone(), tys.clone())],
+                        Box::new(Type::EmptyTagUnion),
+                    );
+                    let includes = type_to_var(subs, rank, pools, cached_aliases, &tag_ty);
+
+                    match unify(subs, actual, includes, true) {
+                        Success(vars) => {
+                            introduce(subs, rank, pools, &vars);
+
+                            state
+                        }
+                        Failure(vars, actual_type, expected_type) => {
+                            introduce(subs, rank, pools, &vars);
+
+                            // TODO: do we need a better error type here?
+                            let problem = TypeError::BadExpr(
+                                Region::zero(),
+                                Category::When,
+                                actual_type,
+                                Expected::NoExpectation(expected_type),
+                            );
+
+                            problems.push(problem);
+
+                            state
+                        }
+                        BadType(vars, problem) => {
+                            introduce(subs, rank, pools, &vars);
+
+                            problems.push(TypeError::BadType(problem));
+
+                            state
+                        }
+                    }
+                }
+            }
+        }
    }
 }

@ -1437,6 +1501,24 @@ fn adjust_rank_content(

                TagUnion(tags, ext_var) => {
                    let mut rank = adjust_rank(subs, young_mark, visit_mark, group_rank, *ext_var);
+                    // For performance reasons, we only keep one representation of empty tag unions
+                    // in subs. That representation exists at rank 0, which we don't always want to
+                    // reflect the whole tag union as, because doing so may over-generalize free
+                    // type variables.
+                    // Normally this is not a problem because of the loop below that maximizes the
+                    // rank from nested types in the union. But suppose we have the simple tag
+                    // union
+                    //   [ Z ]{}
+                    // there are no nested types in the tags, and the empty tag union is at rank 0,
+                    // so we promote the tag union to rank 0. Now if we introduce the presence
+                    // constraint
+                    //   [ Z ]{} += [ S a ]
+                    // we'll wind up with [ Z, S a ]{}, but it will be at rank 0, and "a" will get
+                    // over-generalized. Really, the empty tag union should be introduced at
+                    // whatever current group rank we're at, and so that's how we encode it here.
+                    if *ext_var == Variable::EMPTY_TAG_UNION && rank.is_none() {
+                        rank = group_rank;
+                    }

                    for (_, index) in tags.iter_all() {
                        let slice = subs[index];