The map function in particular is a footgun that hit me twice in two
weeks, in a case similar to this one:
{"apple", "orange", "banana"}.map(x => x.len()).sum()
This looks like it sums the lengths of the strings. Except it doesn't,
it sums the *unique* lengths of the strings, so we get 11 instead of 17.
That's surprising, and I don't want that kind of surprise in RCL.
At first I thought to remove Set.map and Set.flat_map entirely. You
would just have to convert to a list first and map there, and if you
wanted the set, convert back with List.to_set_dedup, which makes it
explicit. That's why I added these conversion functions. Then I
implemented deletion ... and as I was writing the PR description and
changelog entry for how to handle the break, it dawned on me, if we
have the _dedup anyway in to_set_dedup, we might as well use it here.
We don't have to delete these methods, we can just call them _dedup.
{"apple", "orange", "banana"}.map_dedup(x => x.len()).sum()
Now the snippet is no longer so sneaky! It's clear that there is
deduplication going on! So I'm fine keeping them in this form. At least
in the Nix transitive closure test, the map_dedup behavior is what I
want, which validates demand for at least map_dedup. But even then,
a changelog entry for a breaking change that says "renamed to X" is a
much better experience than "have been deleted here are some much longer
alternatives".
Also, this change is relatively safe to make: it will break existing
callers, but they will get an error message and there is a clear path
to fixing this. It doesn't silently change the output of something: it's
being called on sets, so there can't even be fields named 'map' or
'flat_map' because sets don't have fields. So I feel comfortable making
a small breaking change here.
I may have forgotten about Set.sort when I added Set.to_list ... It
feels bit weird to have two identical methods. On the other hand, they
convey very different intent, and using .sort() for places where you
e.g. only want to convert to be able to use a non-deduplicating map, but
where you don't care about order, .to_list() feels like it matches that
intent better than .sort(). So I'll keep both for now, but we can at
least acknowledge it, in case people wonder.
It is sorted now, it will probably remain like that for a while, but I'm
still playing with the idea of making dicts preserve insertion order,
and if I do that, then I would probably also make sets preserve order,
and the sorting property will go away.
If a curious property like this is not documented, people start
wondering, is it *always* sorted? And the answer is yes -- at least for
now. So probably better to document that. I have no illusions that a
note like this will prevent people from depending on it, but that's
fine, if you update you do that consciously and deliberately, I hope.
Naming these was a long bikeshed. Some considerations:
* Symmetry with Set.to_list and String.to_uppercase, keep the to_.
* Naming after an action is clearer than naming after a property,
because for a property it's not clear whether that is a precondition
rather than a postcondition. xs.dedup() makes it clear it removes
duplicates, xs.unique(), does it assert it's unique? Does it test it?
* In light of the above observation, List.sort is a good name, better
than List.sorted.
* The obvious one people will try is probably to_set. If it silently
drops duplicates, that's an unsafe default. So let's make this the
one that validates, and we can have to_set_dedup that drops
duplicates.
* Erroring on duplicates is always safe. We can even point the users to
the version that drops duplicates.
* If you read code and see xs.to_set() without any contrasting
.to_set_dedup(), it's not so obvious that .to_set() does any
validation, and if that is your intent, then you have to clarify
that in a comment.
* xs.to_set_assert_unique in that sense is clearer, but it's too
verbose in practice. xs.to_set_nodup is a bit obscure, especially if
you don't see the contrasting .to_set_dedup().
* A method List.dedup that returns a list, and having to_set being the
one that errors, also works, then you can do .dedup().to_set(), which
is very explicit. It still does not clarify at the call site that
to_set is validating though. It is also inefficient, but that's
probably not a problem for RCL, and you can always use a
comprehension if it matters.
So conclusion for now, to_set is the right prefix for symmetry and
discoverability. There is not going to be a "default" to set, you have
to explicitly choose whether you want to discard duplicates or not. The
to_set_dedup is clear for that. When reading code, to_set_unique is
maybe less obvious, but it gives more hint than to_set, and if you think
for a moment, set values are already unique, so why the _unique, and
then you find the contrast with _dedup. So far this seems like one of
the better directions.
I plan to remove Set.map and Set.flat_map, and to make that slightly
more palatable, let's add Set.to_list, so you can at least replace all
the former calls with .to_list().map(...).
This is another papercut I ran into while working on Advent of Code
exercises. If I recall correctly, previously comments were not allowed
there because there was no good place in the AST to put them, but that
has since been fixed, with parse_expr handling prefixes, so simply
removing the part that skips over noncode does the trick now!
The formatter used to add a space at the end. That was an oversight when
I added the unpack feature. Fortunately it's an easy fix.
I added the golden test first to confirm it would have caught the bug.
Also, it turns out there was a different fmt golden tests that already
encoded the bad space. I missed it in review.
It turns out, we can:
* Allow more cases, including the common case of e.g. 'x > -1', without
creating ambiguities.
* Still make ambiguities such as 'a and not b or c' an error.
* Improve the error message for that case, over my previous one!
And this is a tiny change to the grammar and parser, yay! It feels like
one of those times where I discover the *right* way to do it. Discover,
in the platonic sense.
It *is* pre-1.0 and it *does* occasionally have backwards-incompatible
changes, but they are well documented with clear upgrade paths, and not
happening at a breakneck pace. For example, with the introduction of
unpack, the |-operator is deprecated, but it still works. Or take the
assertion syntax: it changed, but the old syntax is still accepted, and
the autoformatter automatically upgrades documents. So I think that the
previous phrase was a bit too conservative. Other "mature" projects
cause me breakage headaches far more often than RCL. The readme should
reflect that.
I want to make the readme more aimed at people interested in the tool
and project in general, not necessarily at developers hacking on RCL
itself. Let's move that to a separate manual section, since there is
already a development section in there.
Unrelated to this change, but for the assertion syntax change, we should
mention that the formatter auto-updates documents.
Also, the list renders quite dense, an inline code span is not really
the place to demonstrate unpack. I'll leave that to a link. Maybe I
should do an announcement post as well with highlights, in addition to
the more mundane changelog. But RCL is doesn't have that many changes
going into a release anyway. For now I think it's okay.
For now it hard-codes the dependency list. I don't expect the list to
change often, so I think this is okay.
Possibly at some point we can use a tool like "cargo about" to generate
the list of depedencies aand licenses, but for a single dependency, I
think this is overkill.
I started out with "cargo deny", which is easy enough and checks for
more, but I couldn't make it work under Nix because it downloads stuff.
There is a separate "fetch" command, but we'd have to integrate it as a
fixed-output derivation, and some of the downloads are moving targets,
so it's not so clear how to achieve that.
So I thought I'd write a simple script based on "cargo license" instead.
Then I learned that this tool too requires Internet connectivity. Oh
well. No automated dependency check for now then, but at least there is
some check.
Helix needs the pre-compiled grammar, but if you have just a checkout of
the RCL repository, you don't have that. I never discovered this because
in my checkout of course I build the grammar. Fortunately Sergey
reported it so we can address this, thanks Sergey! Also fortunately, we
already have a repository with a compiled grammar, because Zed needs
this too, so we can point there.
There are more and more steps every time, without some checklist, one
day I'm going to forget or mess up one of these steps.
It could be a bit more automated, but I don't want to add a critical
dependency on e.g. GitHub Actions, and having some written playbook is
better than having to keep it all in my head.
The recent Python-related work bumped it. While we can only pin one
thing in the rust-toolchain.toml file, we can at least mention a
compatible nightly in the docs.
This is a squash and rebase of a lot of messy work over the past few
evenings. In the end the diff doesn't look so bad, but the path to get
there ...
== THE GOAL ==
I want to publish the Python module to Pypi, and include a wheel. I want
to automate building it with Nix, so that it's a *single* step, and it
can run on every commit.
Building a wheel that is widely compatible, is somewhat challenging. One
way to achieve it is to build in a container, using an image of some
ancient Linux distro. I don't want Docker in my build process, I want a
single command that builds anything and that always works. Fortunately,
when using Zig as the linker, we can build a manylinux compatible wheel,
and Maturin supports that. So we want to call Maturin from inside Nix to
build the wheel, that's the goal.
== CONSTRAINTS ==
The Nixpkgs snapshot I was using shipped Maturin 1.1. Maturin versions
before 1.9.2 contain a bug where the set the author metadata incorrectly.
So I need 1.9.2. We can't *just* override it though, because Maturin
1.9.2 depends on a library that only compiles with Rust 1.73.0. So we
need to update Rust too.
Now we have two choices: update the Nixpkgs snapshot to one with Rust
1.73 or later, or take Rust from the binary overlay. I explored both,
backtracked, backtracked again, and reverted, but in the end, using the
binary was infeasible. Why? Because the coverage build needs LLVM tools
that we take from Nixpkgs, and the version needs to be compatible. So we
bump the Rust toolchain, see below for the version rationale. It may be
possible to get LLVM tools from the binaries overlay, but the component
that is shipped by rustup does not include the coverage binaries, and
the outlook started looking more and more bleak, taking them from
Nixpkgs just works, and though it adds some constraints, it sidesteps
many problems.
Another place where we take tools from Nixpkgs, is the WASM build. This
requires a nightly compiler. That compiler needs to be recent enough to
be compatible, but because we include rust-src dependencies in our
lockfile to be able to build that component in Nix, the dependencies
can't be *that* recent that they require Rust Edition 2021, because that
is not supported by Rust 1.75. So finding the right nightly was a
challenge too. The nightly that coincides with 1.75 is already too new,
but one 6 weeks prior worked.
Wthe new Nixpkgs snapshot, aside from the WASM tools, we also get a new
version of Tree-sitter. Which of course broke things, but they were
relatively easy to repair. There are also additional warnings now, but
those can be fixed later.
== RUST VERSION ==
Pick Rust 1.75, because that is the version that is currently shipped
in Ubuntu 22.04 Jammy, which is the oldest supported Ubuntu LTS at
this time. I would like to consider Debian oldstable (Bookworm at this
time) as well, but it ships rustc 1.63, which is already older than the
version RCL was using anyway. Debian stable (Trixie) ships 1.85 at this
time. So I think 1.75 is a safe bet for wide compatibility, without
being ancient.
== ZIG CACHE DIR ==
I don't want Docker in my build process. Fortunately, Maturin features
this --zig mode to build a manylinux wheel. Calling it from Nix turned
out to be ... possible, but not trivial. I spent about two hours
debugging an AccessDenied error, which turned out to be due to [1]. When
you read that one, in hindsight it makes sense, but the lengths I had to
go through to figure out *what* was printing AccessDenied and why ...
[1]: github ziglang zig issue 6810
== NIXPKGS & CARGO AUDITABLE ==
In the process of sorting this out, I first got things working with
the older Nixpkgs snapshot, and the Maturin and Zig in there. But when
I needed a newer version of Maturin, things broke. Maturin's manylinux
wheel build with --zig worked in the older version of Nixpkgs, but then
in a newer release it started using a linker flag (--undefined) that Zig
did not understand. After several more hours of debugging, I learned
that that flag is injected by cargo-auditable [2], which despite not
being used by Maturin or cargo-zigbuild, is being used because Nixpkgs
silently replaces cargo with cargo-auditable by default. After learning
that, it was easy enough to turn off. Also just one line, but the effort
to discover that one line ...
[2]: github rust-cross cargo-zigbuild issue 162
== WASM ==
At some point in the process, I was using a later nightly to build the
WASM binary, and I needed to add the --enable-bulk-memory to wasm-opt.
== MATURIN 1.9.3 ==
With Maturin updated, the author metadata is fixed, but it introduced
a new regression: recent versions stop tagging the manylinux wheel
properly. It would get tagged as linux_x86_64, rather than the right
manylinux tags. Fortunately, the Python 'auditwheel' package can fix
this. So I added that one to the build, at that point still from a more
recent Nixpkgs snapshot. Then I pinned Nixpkgs to one with Rust 1.75,
which downgraded auditwheel, and that broke the command line: the newer
version automatically detects compatibility, but the older one needs an
explicit target with --plat. That could be fixed, but then the older
version would still build wheels that contained an unnecessary (and
empty) lib dir, so just pin it to a newer version too.
== VERSION BUMPS SUMMARY ==
* Rust 1.70.0 -> 1.75.0.
* Nixpkgs goes along to one with Rust 1.75.0.
* Rust nightly to 2023-11-09, and Cargo.lock to update the dependencies
of rust-std along with it.
* Wasm-bindgen 0.2.84 -> 0.2.89 due to the Nixpkgs bump.
* Tree-sitter update due to the Nixpkgs bump.
== RESULT ==
With all of this in place, we can now nix build .#pyrcl-wheel, and we
get a portable manylinux wheel that can be used directly with uv, and
can be uploaded to Pypi!
This makes the tests introduced in the parent commit pass. I say "tweak"
rather than "fix" because I think there is something to say for both
behaviors, and it's arguable whether you want a number with exponent but
without decimal point to be like a float or like an int, but at least
for consistency with 1e-1, 1e0, 1e1, I think it makes more sense to drop
the e0, than to turn the entire thing into a number with decimal point.