The map function in particular is a footgun that hit me twice in two
weeks, in a case similar to this one:
{"apple", "orange", "banana"}.map(x => x.len()).sum()
This looks like it sums the lengths of the strings. Except it doesn't,
it sums the *unique* lengths of the strings, so we get 11 instead of 17.
That's surprising, and I don't want that kind of surprise in RCL.
At first I thought to remove Set.map and Set.flat_map entirely. You
would just have to convert to a list first and map there, and if you
wanted the set, convert back with List.to_set_dedup, which makes it
explicit. That's why I added these conversion functions. Then I
implemented deletion ... and as I was writing the PR description and
changelog entry for how to handle the break, it dawned on me, if we
have the _dedup anyway in to_set_dedup, we might as well use it here.
We don't have to delete these methods, we can just call them _dedup.
{"apple", "orange", "banana"}.map_dedup(x => x.len()).sum()
Now the snippet is no longer so sneaky! It's clear that there is
deduplication going on! So I'm fine keeping them in this form. At least
in the Nix transitive closure test, the map_dedup behavior is what I
want, which validates demand for at least map_dedup. But even then,
a changelog entry for a breaking change that says "renamed to X" is a
much better experience than "have been deleted here are some much longer
alternatives".
Also, this change is relatively safe to make: it will break existing
callers, but they will get an error message and there is a clear path
to fixing this. It doesn't silently change the output of something: it's
being called on sets, so there can't even be fields named 'map' or
'flat_map' because sets don't have fields. So I feel comfortable making
a small breaking change here.
Naming these was a long bikeshed. Some considerations:
* Symmetry with Set.to_list and String.to_uppercase, keep the to_.
* Naming after an action is clearer than naming after a property,
because for a property it's not clear whether that is a precondition
rather than a postcondition. xs.dedup() makes it clear it removes
duplicates, xs.unique(), does it assert it's unique? Does it test it?
* In light of the above observation, List.sort is a good name, better
than List.sorted.
* The obvious one people will try is probably to_set. If it silently
drops duplicates, that's an unsafe default. So let's make this the
one that validates, and we can have to_set_dedup that drops
duplicates.
* Erroring on duplicates is always safe. We can even point the users to
the version that drops duplicates.
* If you read code and see xs.to_set() without any contrasting
.to_set_dedup(), it's not so obvious that .to_set() does any
validation, and if that is your intent, then you have to clarify
that in a comment.
* xs.to_set_assert_unique in that sense is clearer, but it's too
verbose in practice. xs.to_set_nodup is a bit obscure, especially if
you don't see the contrasting .to_set_dedup().
* A method List.dedup that returns a list, and having to_set being the
one that errors, also works, then you can do .dedup().to_set(), which
is very explicit. It still does not clarify at the call site that
to_set is validating though. It is also inefficient, but that's
probably not a problem for RCL, and you can always use a
comprehension if it matters.
So conclusion for now, to_set is the right prefix for symmetry and
discoverability. There is not going to be a "default" to set, you have
to explicitly choose whether you want to discard duplicates or not. The
to_set_dedup is clear for that. When reading code, to_set_unique is
maybe less obvious, but it gives more hint than to_set, and if you think
for a moment, set values are already unique, so why the _unique, and
then you find the contrast with _dedup. So far this seems like one of
the better directions.
I plan to remove Set.map and Set.flat_map, and to make that slightly
more palatable, let's add Set.to_list, so you can at least replace all
the former calls with .to_list().map(...).
It turns out, we can:
* Allow more cases, including the common case of e.g. 'x > -1', without
creating ambiguities.
* Still make ambiguities such as 'a and not b or c' an error.
* Improve the error message for that case, over my previous one!
And this is a tiny change to the grammar and parser, yay! It feels like
one of those times where I discover the *right* way to do it. Discover,
in the platonic sense.
This does look more pleasant, yay! I do notice that I find '...' tedious
to write though, and maybe a bit too verbose. Maybe I want to change
that to '..' after all, even for dicts. But we can do that later, let's
see how this goes.
Initially I used a comma after assertions, like Python. It was always
a bit weird (even in Python), it's a statement but it has arguments.
This made formatting akward too. I justified it for myself by saying
"it's just a function call, but without the parens". That's how we
currently format it.
Then yesterday I was skimming through the Jsonnet docs, and they use
a colon rather than comma, and that looks much more pleasant! I should
steal the good ideas; if RCL can fix list comprehension order w.r.t
Python, then it can also fix this one.
For compatibility, accept the comma as well (for now), but make the
formatter emit the colon. Similar to the colon after 'else', it does
not have to be a breaking change.
I should have put it in there in the first place. I would make a fixup
and fix the original commit, but it's so deep in the history now that I
better not rewrite it.
I want to publish a new version of the Zed plugin, and it points to the
Tree-sitter grammar by commit. This is what got the whole "rcl patch"
thing started!
So now, upon export of the Zed extension, we'll pin it to the HEAD of
the adjacent tree-sitter-rcl repository.
When some data in a big configuration needs to come from other sources,
you can already dump it as json and then import it into RCL. But that
does spread your configuration over multiple files, and it makes Git
diffs harder to interpret, when context is missing.
I encountered a similar case at work a while back, where we had some
human-written config, but we wanted a script to edit it, while
preserving formatting and comments etc. Of course what you can always do
in that case is fall back to 'sed', and even though it might match false
positives, or create syntactically invalid files in theory, in practice
it works very well, especially if you add some marker comment.
But there *is* a proper way to do this. Parse into a concrete syntax
tree, and then do some surgery on the tree to splice in the new value.
And hey, what do you know, in RCL we happen to have a parser that builds
concrete syntax trees. And a pretty-printer for them. I thought at
first it would be difficult, because how do you navigate expressions?
But we can start with dicts that use record syntax, and including let
bindings is not too difficult either. Then we just walk the AST trying
to match identifier by identifier.
Bracket matching works nicely now. Indents mostly work, but there are
some bad cases. After a 'let' or opening bracket, Zed correctly indents,
but then when you type 'if ...:' in there, it suddenly dedents that. I
don't understand why, so I don't know how to fix it. But for now, this
is already better than the previous version of the plugin, which had
neither.
The documentation for these is extremely sparse. I thought at firast
that this replaces the "brackets" section of the config.toml, especially
because that section is not documented, but in the upstream repo it does
still exist for the first-party plugins, so let's keep it here too. I
can experiment later and see what happens if we delete it.
First let's try and see if these new files do anything.
Also improve the documentation around these themes, where to find the
names that can be used, and the structure of the docs about the grammar
directory in general.
The module name and path hack throws off Mypy, in different ways
depending on how you run it (outside Nix shell, inside Nix develop
shell, or as part of the flake check ...). Let's just forward-declare
it and sidestep problems.
Maybe I should make this script the source of truth and generate the
Pygments grammar from it. We could do that later, for now this works.
This corrects one bug in the plugin there that went unnoticed: the
'enumerate' builtin was not highlighted previously. So that's a win
for generating things!
I regularly add new methods, and it's becoming tedious to have to
remember to update all the places that reference these, so let's
generate them and automate the process. For now, I'm choosing the
Pygments grammar as the source of truth, and the first target to
generate is the fuzz dictionary.
I'm leaving the Zed extension pointing to the older commit of the
Tree-sitter grammar, I'll update that after this version bump. It's
a bit awkward to do it this way around, but there are circular
dependencies that can't be avoided. Maybe with an attack on SHA1 it
can be done in theory, but let's not go there.
At first I also wanted to support rounding to a negative number of
decimals (so rounding to a positive power of 10), but scope creep,
complications ... I don't need it, and we can always add that later.
RCL aims to be obvious to understand. Num might be cryptic for new users,
and although we also have "Int" rather than "Integer", that one is very
established, "Num" may be a bit too obscure. (We also have "String"
rather than "Str", consistency ...). It's a type that I expect has
little use for end-users, but it shows up in the negation error message,
so let's make it unambiguous and call it "Number".
I realized today that I want this. In particular, the API of my music
player Musium returns albums with a numeric playcount and discovery
score, and I want to sort on that. Finally that is possible now that I
am adding support for floats. But I need a way to sort on one field of
a dict! Arguably this is more important than the bare sort itself.
While I do this for lists, we can do the same for sets.
It started to get annoying to have to define it myself every time, so
let's just add it properly now. This also resolves the longstanding
issue in the RCL pretty-printer that we have no good way to print the
empty set -- now we do!
This is a bit annoying now, I have to dump them one by one and update
the repository url. But the format forces that. I could automate it
further, but for now it's okay to do by hand.
This has not been tested yet. Also, I may need to move it into a
separate repository, but we can do that in the same way as with the
Tree-sitter repo that I just extracted.
When I initially converted from the "if-then-else" keyword syntax to the
colon-based one, during development I put the colon in, then later I
removed it again, but I was still unsure. Now, after having used the
syntax for some time, my feeling is that there should be a colon after
all. Nim got it right. So put it back.
Because it's easy to stay backwards compatible here, make the colon
optional. We can make it mandatory at some point in the future, but even
making the autoformatter put it there is probably a strong enough push.