Commit graph

127 commits

Author SHA1 Message Date
Ruud van Asseldonk
c104f06b8a Simplify the parser by adding an Eof token
I learned about this in one of Matklad's posts, it's a nice idea that
simplifies things a bit.
2024-08-24 12:30:59 +02:00
Ruud van Asseldonk
816a00745f Optionally allow a colon after "else" in grammar
When I initially converted from the "if-then-else" keyword syntax to the
colon-based one, during development I put the colon in, then later I
removed it again, but I was still unsure. Now, after having used the
syntax for some time, my feeling is that there should be a colon after
all. Nim got it right. So put it back.

Because it's easy to stay backwards compatible here, make the colon
optional. We can make it mandatory at some point in the future, but even
making the autoformatter put it there is probably a strong enough push.
2024-07-31 21:46:23 +02:00
Ruud van Asseldonk
933712cb8c Add parse error help for certain keywords
Now that full expressions are no longer allowed in some places, the
changelog mentions how to fix it, but we can actually put the help
straight into the code. And then it's helpful in all cases where you
try to use an expression and the keyword would be valid if you added
paretheses.
2024-06-16 23:16:01 +02:00
Ruud van Asseldonk
1828e566b6 Use expr_op for collection in for loops too
This change is similar to the one for conditionals. Like them, it stems
from the formatter printing a raw space before the collection, which is
problematic for non-code prefixes. Only here it did not surface as a non-
idempotency in the formatter because no non-code is allowed before the
collection. Still I think if you want to write this:

    [for x in let y = [1, 2, 3]; y: y * 2]

Then instead you should write this:

    [for x in (let y = [1, 2, 3]; y): y * 2]

It's much clearer and it creates fewer problems with formatting.
2024-06-16 22:54:24 +02:00
Ruud van Asseldonk
43fd48d106 Restrict what expressions can be conditions
The fuzzer now discovered a non-idempotency in the formatter, in case
there is a non-code prefix for the condition. This has something to do
with the space between "if" and the condition, there is no separator
there, whatever follows goes on the same line, which is usually not the
case.

For evaluation everything works fine, but how do you format this? We can
try to repair it, but it's hard. A solution that sidesteps all this is
to restrict what kind of expressions we can have after an if. Just don't
allow statements and ifs there. We don't lose any expressivity, if you
want that it still works, just put parens around it. With parens it is
also possible to format the expression properly, e.g.

if (
  // Comments are fine, everything is indented here, etc.
  condition
):
  then_value
else
  else_value

Oh, and unrelated, I think I am convinced that I want the colon after
"else" back. But let's do that in a follow-up. Or maybe it can be
optional but the formatter always puts it there?

The implementation of this forces propagating spans in more places,
which ended up being a drive-by fix for one place where spans were
computed incorrectly. This fix shows up in the golden tests.
2024-06-16 22:49:36 +02:00
Ruud van Asseldonk
c16274de42 Remove SpanPrefixedExpr from parser
Hmm, at this point its not unanimously more elegant. When parsing a
sequence of things that may have trailing non-code, it was nice to parse
the non-code and then look at the separator. Now we have to peek over it
instead. Alternatively, I could parse it, and have a way to pass it in
to parse_expr as "seed non-code", but that is also a bit clumsy. For now
the "peek past" will do.
2024-06-16 22:49:31 +02:00
Ruud van Asseldonk
48cae78393 Remove now-redundant cases of Prefixed<Expr>
Wow! My change to make a statement expr be the top-level one that
optionally includes a prefix was a great discovery! Everything becomes
much simpler now! No need to store those prefixes separately everywhere
any more! I should have done this much earlier. And all of the golden
tests still pass with this change, it's almost magical. Also a good
demonstration of how something that ends up looking simple may not be
simple to discover.

I suspect that a similar transformation is possible with what is
currently Prefixed<Seq>, I'll look into that next.
2024-06-16 22:49:27 +02:00
Ruud van Asseldonk
a903e7610e Parse statements as a list
This changes the CST to keep a list of statements instead of making it
a degenerate tree that nests deeper for every statement. The primary
reason for doing this is to enable better pretty-printing by formatting
either the entire chain as wide or tall, and not breaking up a chain of
let bindings or assertions where some are on a line and some are not.

This is quite a deep change in one sense, but the code changes ended
up being smaller than I expected. It also ends up enabling non-code
prefixes in more places, so I think this is a good change in general.
I need to audit the parser and CST because I think there are now a few
places where a prefix is stored separately that would now be parsed into
an Expr::Statements node instead.

But what surprises me most, after I got everything to compile, all the
golden tests still pass aside from a few reformattings, and the new
format looks universally better than the old one. Wow! I think I really
discovered the "right" way to implement this!
2024-06-16 22:44:40 +02:00
Ruud van Asseldonk
b2b63f08f1 Implement magic trailing comma for types too
This also resolves one to-do! This List type was a good idea after all,
now we have a place to store the suffix.
2024-06-15 21:55:38 +02:00
Ruud van Asseldonk
054d998e33 Record in CST whether trailing commas are present
I want to implement Python/Black's magic trailing comma, so a first step
is to store whether it's there. This makes my elements/suffix tuple even
larger, so let's make that a proper struct. That is a bit of an invasive
change unfortunately.
2024-06-15 21:55:38 +02:00
Ruud van Asseldonk
c21485bc90 Make chained expressions first-class in the CST
We do this to enable formatting a chain of field access in its entirety
as wide or tall. In the formatter, individual field expressions no
longer create groups, there is one group for the chain.

For now we do keep the tree structure (that in practice degenerates to
a singly linked list) in the AST, because it would be an invasive change
to make that a list too, and I'm not sure that is worth it. (It would
help somewhat to avoid stack overflow, but it may complicate the
typechecker and evaluator.) So the abstractor translates the list back
into a tree.
2024-06-15 15:06:12 +02:00
Ruud van Asseldonk
9dbee67306 Record entire span of functions, not just =>
We have this span available in the typechecker now. It's nicer to report
the entire thing in error messages than just the arrow.
2024-02-24 21:43:39 +01:00
Ruud van Asseldonk
9f1eadb45c Include spans necessary for error reporting
Now we can properly highlight the dict keys and if-then-else bodies.
2024-02-24 21:43:39 +01:00
Ruud van Asseldonk
acf9b9d2a4 Record entire span for type expressions
For functions and generic types, I want to highlight the entire thing
as the expected type, not just the type constructor or the "->" arrow.
2024-02-24 21:43:39 +01:00
Ruud van Asseldonk
8dec29873b Port all typechecks to the new TypeReq machinery
This finishes ("finishes") the refactor to move to the new expectation
system. Some things are still not implemented, but at least the
old-style checks are now completely replaced.
2024-02-24 21:43:39 +01:00
Ruud van Asseldonk
3114724428 Update expected output for static arity error
This also changes some of the spans that the errors get reported to, in
a way that I am now happy with it. Errors in let bindings should be
blamed on the value span, not on the identifier.
2024-02-24 21:43:38 +01:00
Ruud van Asseldonk
ebcf60fc2d Slightly simplify span handling in parser
We don't need a stack of spans, the begin can live on the native call
stack.
2024-02-24 21:43:38 +01:00
Ruud van Asseldonk
d92be8e0f4 Add spans to BinOp 2024-02-24 21:43:38 +01:00
Ruud van Asseldonk
0855969b79 Define a type lattice after all
I didn't want to go down the path of unifying all the elements of alist
type, because I fear it will be expensive when loading large json
documents with many big and deep dicts. Inferring those will lead to
huge types, and we can't even share the Rc instance if they are inferred
all the time.

BUT, let's not worry about performance right now. Setting that concern
aside, it is very tempting to just do the inference. And then a lattice
naturally falls out. Maybe in the end I end up removing some of this
again, but just having the lattice will probably lead to a better
design.

Then an open question: I have meet, which is enough for primitive types
and for ??variant types, (I never remember, is it covariant or
contravariant?), which List and Dict and Set are in RCL, because you can
only get stuff out, not put stuff in. Function return types also behave
like that. But then for function arguments I need join. And join of two
incompatible things is Void. Which makes sense in a statically typed
setting. If I do e.g.

    let fs = [
      // (Int) -> Int
      x => x + 1,
      // (Bool) -> Bool
      x => not x
    ]

Then in general an element of 'fs' is a function that I cannot call
because the input would need to be both Int and Bool. But at runtime, I
don't call a functino "in general", I call a particular one, which can
be fine. How to type that? I think that if any of the arguments
collapsed to Void, we need to instead make it Dynamic, and check at
runtime. But that is the complete opposite of what the lattice does! It
feels inelegant and like a hack. So I'm not sure about that yet, but
let's see when we get to it, maybe I'm missing something obvious and
I'll have a better idea then.
2024-02-24 21:43:38 +01:00
Ruud van Asseldonk
1cd625fa42 Exclude some regions from code coverage
Saying that a Debug impl is not covered is counterproductive. It is not
expected to be covered when code is correct. So exclude these regions,
because a drop from 100% coverage to <100% stands out a lot more than a
drop from 97.8% to 97.2% when I do add one line that is not covered but
might be an important edge case.
2024-02-22 20:57:12 +01:00
Ruud van Asseldonk
ac737174fc Make the parser not eat some comments
There were some cases where non-code before a closing delimiter could be
eaten, because we consumed it before discovering the closing delimiter.
This happens in the real world, mostly when you have a collection and
you comment out the last element. To fix this, put the non-code in the
CST and emit it when formatting.
2024-02-01 19:40:12 +01:00
Ruud van Asseldonk
dd626efa99 Disallow prefixes on type annotations
I had a prefixed type at the top-level let previously, but it leads to
a bad case that breaks idempotency in the formatter (see parent commit).
It is possible to remediate this while preserving prefixes, but that
would add complexity, and why would do you even want to have comments
between the : and the type? I allowed the prefixes in type lists, in
generic instantiations and in function types, because there I you might
want to document the types, e.g.

    let f: (
      // The number of widgets.
      Int,
      // The maximum widget serial number.
      Int,
    ) -> Int = (n, m) => ...

But between the : and the type ... then you should just put the comment
above the let. So we simplify the CST, and fix a bug!
2024-01-30 21:34:56 +01:00
Ruud van Asseldonk
9db6d683ee Correct index out of bounds in parse error
This bug was introduced with the addition of type annotations, and
discovered by the fuzzer.
2024-01-30 21:34:56 +01:00
Ruud van Asseldonk
f74617c2d5 Bring types into the AST
The abstractor for it is not yet implemented, and evaluation will panic
if it encounters a type signature.
2024-01-30 21:34:56 +01:00
Ruud van Asseldonk
7b1e0f9860 Add friendly errors for parsing function types 2024-01-30 21:34:56 +01:00
Ruud van Asseldonk
05c8c9f959 Expand the type parser a bit, adjust grammar
We can require '(' for function types, and not allow them anywhere else.
That makes the grammar slightly easier to parse.
2024-01-30 21:34:56 +01:00
Ruud van Asseldonk
8e7458e75b Prepare for parsing type application and functions 2024-01-30 21:34:56 +01:00
Ruud van Asseldonk
d0b73747db Define initial skeleton for type parser 2024-01-30 21:34:56 +01:00
Ruud van Asseldonk
8f284bf5ab Add nodes for types to the CST
Just some initial ones, not exhaustive yet.
2024-01-30 21:34:56 +01:00
Ruud van Asseldonk
be7d9dc637 Remove ':' after 'else' after all
After sleeping on it for a night, let's remove unnecessary syntax noise.
Add a friendly error in case a user writes the colon after all.
2023-12-29 13:23:54 +01:00
Ruud van Asseldonk
75672fb35b Remove 'then' keyword from the lexer
It is now unused, people can use it as an identifier if they like.
2023-12-28 23:58:05 +01:00
Ruud van Asseldonk
3aabf087a0 Change if-then-else syntax to use : instead of then
After having used RCL in practice for a few months, I think I'm leaning
towards this. It resolves an awkward way of formatting the if-then-else
multi-line, and it's more consistent with 'if' inside a comprehension.
It also makes the syntax resemble Python more for the multi-line case.

What was holding me back previously is that I think the colons on a
single line look kind of awkward, e.g.

    let x = if cond: true-val else: false-val;

But I expect I will get used to it. What pushed me over the edge was
that Nim uses this syntax. Though Nim is maybe not the best
justification to cite here, from my very limited experience using it,
it looks like a kitchen sink of syntax, which is a valid point in the
design space (Perl and Raku have fans too after all) but the opposite
of what I want RCL to be.

But I do think this change brings more consistency and regularity. RCL
is more colon/delimiter based, and less keyword-based. No 'begin end'
but '[]', and no 'then' but ':' fits with that.
2023-12-28 22:54:27 +01:00
Ruud van Asseldonk
f0d4710db6 Attribute trace spans to the message, not keyword
I find this reads a bit more naturally, and it aligns better with
assertions too, which highlight the expression span, not the keyword.
2023-12-02 14:20:44 +01:00
Ruud van Asseldonk
96b17f1974 Add goldens for group_by misuse errors
The errors are not great, confusing even. I think I'll not report on the
details of why the call failed, and just report _that_ it failed.
2023-11-30 21:10:01 +01:00
Ruud van Asseldonk
71982f4f76 Ensure blanks are allowed before => in lambda
Caught through the coverage report. The case of a comment before the =>
is not covered though, because it is a syntax error.
2023-11-30 19:45:26 +01:00
Ruud van Asseldonk
46a033eb11 Do not allow comments on a lambda body
It makes things difficult to format. Instead, you should place the
comment before the lambda. Or put parens around the lambda. It's a bit
unfortunate, but let's go with it for now.
2023-11-30 00:20:03 +01:00
Ruud van Asseldonk
3fdc362f1f Add more tests for formatting lambdas
I'm not so happy with the way the hanging works, but to get rid of that,
I think I will have to remove the ability to add comments on the body.
Or maybe change the way prefixed_expr gets converted into a doc.
2023-11-30 00:20:03 +01:00
Ruud van Asseldonk
3cf65f248f Rename Lambda to Function
User-facing, I think it is a bit friendlier to call them "function" in
error messages and such. And if I do that, I should be consistent and
call them functions everywhere.
2023-11-30 00:20:03 +01:00
Ruud van Asseldonk
5bb88b03de Implement lambda values
This is hairy, I need to keep the environment alive, and also keep the
AST alive. Now the AST bleeds into runtime values, and stuff like Env
now needs to be Ord for values to be Ord, etc.
2023-11-30 00:20:03 +01:00
Ruud van Asseldonk
f9bdec13a9 Define lambdas in the grammar
This was a bit of an adventure, to get the grammar right. See also a
discussion of the ideas in ideas/lambdas.rcl added in this commit. For
now I will plow through with the syntax that I like best, let's see if
it works out, we can always change it later.
2023-11-30 00:20:03 +01:00
Ruud van Asseldonk
89283097a1 Move expr_import inwards in the grammar
Previously, with expr_import at the same level as statement-like
expressions, it was not part of expr_op, and inside a sequence, only
expr_op is used, which meant that you couldn't put imports in a
sequence, at least not without parentheses. By pushing expr_import more
inwards to be part of expr_op, it is now allowed inside sequences.

Also add a test for this.
2023-11-28 18:53:51 +01:00
Ruud van Asseldonk
168e1fa9e9 Implement unary negation of integers
And rename boolean negation to "not" instead of "neg", for clarity.
2023-11-22 20:03:36 +01:00
Ruud van Asseldonk
3a7eafe797 Implement division operator 2023-11-22 19:55:27 +01:00
Ruud van Asseldonk
8878b99e31 Implement binary minus operator 2023-11-22 19:36:05 +01:00
Ruud van Asseldonk
9fe89fc3b5 Implement indexing for lists 2023-11-21 19:38:59 +01:00
Ruud van Asseldonk
f17d344c42 Implement parser for indexing syntax 2023-11-21 19:15:36 +01:00
Ruud van Asseldonk
7ceb17a338 Make Clippy happy
I don't think this is really an improvement, but I don't have a very
strong opinion on it and silencing Clippy is as ugly, so let's just do
it.

Also, where I previously had to silence Clippy (for the same lint), I
did end up adderssing that, and the silences are no longer needed.
2023-11-04 20:28:11 +01:00
Ruud van Asseldonk
3b0a0c8b0b Pipe call arg spans through for better errors
When the file could not be loaded, I would like to highlight the
argument of the call that is the path. But to be able to do that, we
need to thread the spans through everywhere.
2023-11-04 20:18:12 +01:00
Ruud van Asseldonk
da22ccad57 Change the tagline
Why reasonable?

 * Humans can easily reason about what a given expression will evaluate
   to (unlike yaml).
 * It's a retroactive change (or addition) to the name "RCL": Reasonable
   Configuration Language.
 * Calling RCL sane is subjective anyway, and it implies that other
   options may be insane. Although if I change it to "unreasonable",
   maybe I am calling others unreasonable.
2023-10-21 10:53:34 +02:00
Ruud van Asseldonk
878eaa4aa4 Use comma as the separator in record notation
I am still ambivalent about this. On the one hand I have a strong sense
that "key = value;" is a statement that needs a terminator. On the other
hand, it makes things more uniform to have only a single separator.

I think I just need to get used to the comma, and then I will not mind
so much. But let's see.
2023-10-21 10:43:44 +02:00