Commit graph

175 commits

Author SHA1 Message Date
Ruud van Asseldonk
19a7270b2c Add Collection type to builtins list 2025-03-15 21:14:46 +01:00
Ruud van Asseldonk
9054cb9bf2 Define a Collection supertype for List and Set
I want to add an unpack operator, and to be able to type the type
expectation, I need a collection supertype for List and Set. Maybe
once I have this, I can actually type the union operator in a more
elegant way as well and I no longer need unpack. But unpack seems
nice anyway, and a Collection supertype seems nice as well.
2025-03-15 21:13:43 +01:00
Ruud van Asseldonk
5bcc7e74c7 Generate builtins list for the smith fuzzer
This moves one more place of duplication of builtins into
generate_keywords.py as a single source of truth, resolving
a to do in the smith fuzzer.

This does once more shuffle all of these around in the fuzzer, which
makes the existing fuzz corpus mostly meaningless. Fortunately, this
should be the last time that this happens: with the new approach we
can modify the builtins with minimal changes to the meaning of the
fuzz corpus, which is something that I wanted for a long time.
2025-03-15 19:43:13 +01:00
Ruud van Asseldonk
34e347a387 Generate fuzz dictionary from Pygments grammar
I regularly add new methods, and it's becoming tedious to have to
remember to update all the places that reference these, so let's
generate them and automate the process. For now, I'm choosing the
Pygments grammar as the source of truth, and the first target to
generate is the fuzz dictionary.
2025-03-03 22:14:27 +01:00
Ruud van Asseldonk
59be133128 Bump version to 0.8.0
I'm leaving the Zed extension pointing to the older commit of the
Tree-sitter grammar, I'll update that after this version bump. It's
a bit awkward to do it this way around, but there are circular
dependencies that can't be avoided. Maybe with an attack on SHA1 it
can be done in theory, but let's not go there.
2025-03-02 21:15:33 +01:00
Ruud van Asseldonk
f0e4cd13b5 Add Number.round method
At first I also wanted to support rounding to a negative number of
decimals (so rounding to a positive power of 10), but scope creep,
complications ... I don't need it, and we can always add that later.
2025-03-02 18:32:21 +01:00
Ruud van Asseldonk
eaeb54c424 Simplify and document fuzz_decimal fuzzer 2025-02-25 20:49:40 +01:00
Ruud van Asseldonk
27f91ac012 Remove various final references to Int 2025-02-24 21:08:49 +01:00
Ruud van Asseldonk
c7b62c9ee5 Add reminder to remove Int from fuzzer 2025-02-24 20:42:59 +01:00
Ruud van Asseldonk
e9748822c2 Implement Decimal::checked_add
The implementation is a bit naive, but it's good enough for now.
2025-02-24 20:42:59 +01:00
Ruud van Asseldonk
caa6605bf9 Allow parsing negative decimals
This makes the testing code and fuzzers a bit cleaner.
2025-02-24 20:42:59 +01:00
Ruud van Asseldonk
fda5cf01c6 Extend decimal fuzzer for roundtrips, fix issue
It found one issue right away, related to using an i16::MIN exponent,
which overflows the way we parsed. But then I realized there are a few
other bugs in the number parser ... I added a marker for one and fixed
handling of the implicit exponent offsets.
2025-02-24 20:42:59 +01:00
Ruud van Asseldonk
9030ffd528 Store decimal point and exponent separately
This enables us treat numbers with an exponent losslessly. We don't
conflate the decimal point with the exponent in case they get in the
way of each other.

It also greatly simplifies the formatting. We can mechanically format
the representation now, without having to use heuristics for when to
switch to scientific notation. The catch is of course that the
heuristics will need to move elsewhere. We'll have to normalize the
numbers after arithmetic operations.
2025-02-24 20:42:59 +01:00
Ruud van Asseldonk
3dafed4571 Rename Num type to Number, add highlighting for it
RCL aims to be obvious to understand. Num might be cryptic for new users,
and although we also have "Int" rather than "Integer", that one is very
established, "Num" may be a bit too obscure. (We also have "String"
rather than "Str", consistency ...). It's a type that I expect has
little use for end-users, but it shows up in the negation error message,
so let's make it unambiguous and call it "Number".
2025-02-24 20:42:59 +01:00
Ruud van Asseldonk
18ba7e9176 Extend Decimal fuzzer to cover more cases 2025-02-24 20:42:59 +01:00
Ruud van Asseldonk
94feb8cb9d Add a fuzzer to test various Decimal properties
This is only the start, but let's verify Decimal::cmp against f64::cmp.
It instantly finds an input where they disagree:

	Compare {
	    a: NormalF64(
	        -0.16406250000007813,
	    ),
	    b: NormalF64(
	        0.0,
	    ),
	}
2025-02-24 20:42:59 +01:00
Ruud van Asseldonk
2862a21512 Add surrogate pair exception to json superset fuzzer
Surrogate pairs are not supported by RCL on purpose, so when that can be
parsed by Serde but is rejected by RCL, we shouldn't fail the fuzzer on
it.
2025-02-24 20:42:59 +01:00
Ruud van Asseldonk
5480b97450 Add float parsing exception to TOML fuzzer
RCL can handle larger exponents on floats, we have to admit that then.
2025-02-24 20:42:59 +01:00
Ruud van Asseldonk
de496af1be Add decimal parsing exception to fuzzer
This adds back the exception that was removed by allowing float parsing
imprecision, though in a more limited form initially because it only
affected exponents.

But after running the fuzzer for a bit longer, it also affects large
integers, so we are back to the start, overflow is just an intentional
incompatibility.
2025-02-24 20:42:59 +01:00
Ruud van Asseldonk
647e457d67 Allow parsing floats that overflow i64
This removes one case of incompatibility with Serde. If you write a
float literal that is too precise to be represented exactly, then we now
silently round it rather than treating it as an overflow error. I think
this is acceptable because if you are in the case where you care about
numbers to 19 significant digits then probably RCL is not the best tool
for what you are doing, but the case where we encounter some arbitrary
json that we want to query with "rcl jq" and it happens to have some
humongous float in it, that is probably more likely. Python handles
float literals in this way too so I think it's okay.
2025-02-24 20:42:59 +01:00
Ruud van Asseldonk
4e3efcdedb Add fuzzer exception for large exponent numbers
The choice I went with is to have a 16-bit exponent, which gives RCL's
float/decimal type more range than a regular f64. Now the fuzzer can
generate an input with a large exponent, and RCL will happily echo it,
and it's technically syntactically valid json, but Serde rejects it with
"number out of range" (in the same way that RCL rejects some numbers as
overflow). So add an exception for this mismatch.
2025-02-24 20:42:59 +01:00
Ruud van Asseldonk
d2df2dd48a Relax Serde json fuzzer slightly
RCL rejects 9223372036854775807.576460752303423487 with an overflow
error, but I think that is fine, I don't want to lose precision on
inputs.
2025-02-24 20:42:59 +01:00
Ruud van Asseldonk
65a0984c2f Add fuzzer to ensure RCL is a json superset
The one thing that prevents that right now is floats, and the fuzzer
discovered it within a few seconds:

  ╭──────╴ Opcode (hex)
  │  ╭───╴ Argument (hex)
  │  │  ╭╴ Operation, argument (decimal)
  26 03 ExprPushInput, 3
        take_str, 3   → "4e2"
  e6 01 ModeJsonSuperset, 1
  EvalJsonSuperset -->
  4e2
2025-02-24 20:42:59 +01:00
Ruud van Asseldonk
6c0734148f Add List.sort_by and Set.sort_by methods
I realized today that I want this. In particular, the API of my music
player Musium returns albums with a numeric playcount and discovery
score, and I want to sort on that. Finally that is possible now that I
am adding support for floats. But I need a way to sort on one field of
a dict! Arguably this is more important than the bare sort itself.

While I do this for lists, we can do the same for sets.
2025-02-24 19:44:04 +01:00
Ruud van Asseldonk
9dc5092279 Bump version to 0.7.0 2024-12-31 13:34:37 +01:00
Ruud van Asseldonk
2aabff47ee Dogfood-generate pyrcl Cargo.toml as well 2024-12-08 15:42:28 +01:00
Ruud van Asseldonk
aa55c4b076 Move fuzz/Cargo.toml to dogfood generated file 2024-12-08 15:42:28 +01:00
Ruud van Asseldonk
4104ba3a2f Fix typo in doc comment 2024-12-08 12:36:27 +01:00
Ruud van Asseldonk
663d954c33 Fuzz new List and Set methods
And confirm that the fuzzer works by temporarily putting panics in the
new code paths (not part of this commit).
2024-12-07 20:57:26 +01:00
Ruud van Asseldonk
2601582abe Add std.empty_set constant
It started to get annoying to have to define it myself every time, so
let's just add it properly now. This also resolves the longstanding
issue in the RCL pretty-printer that we have no good way to print the
empty set -- now we do!
2024-12-07 20:26:40 +01:00
Ruud van Asseldonk
dff50984d9 Document and highlight new List.sort method 2024-12-01 13:15:28 +01:00
Ruud van Asseldonk
1bac59668e Bump version to 0.6.0
See docs/changelog.md for a summary of the changes in this release.
2024-12-01 11:52:49 +01:00
Ruud van Asseldonk
c51b77a59e Add 'rcl re' and 'rcl rq' shorthands for -fraw
This is somewhat common, especially when used as jq replacement, so
let's add a shorthand for them.
2024-08-23 22:14:41 +02:00
Ruud van Asseldonk
05ceea94c0 Use new if-else syntax in Smith fuzzer
It means the fuzzer gets to explore less, actually, but we still have
the source-based fuzzer that will find the case where the colon is
missing, and which could hunt for non-idempotencies in the formatter and
such.
2024-07-31 21:53:58 +02:00
Ruud van Asseldonk
28d920e4ac Bump version to 0.5.0 2024-07-28 21:43:29 +02:00
Ruud van Asseldonk
12ffe62cf4 Address a few minor issues in the new build code
Caught in self-review, and I don't feel like turning them into fixups
for all of the commits that introduced these.
2024-07-27 23:03:05 +02:00
Ruud van Asseldonk
0831fda447 Add --dry-run option to "rcl build"
As expected, the golden tests fail to run under Nix because the test
directory is not writable. And it's better to not write in my opinion,
let's not hack that and have a dry run output mode.

For now the output format is not structured, this is good enough for the
thests. It could be nice to do structured output in RCL format, but we
can do that later if needed.
2024-07-27 23:03:05 +02:00
Ruud van Asseldonk
4aaab52879 Add CLI option to specify --banner in output
For the 'build' subcommand, I am adding a banner setting, so for
consistency, let's have it for all evaluation commands.
2024-07-13 12:57:34 +02:00
Ruud van Asseldonk
2967267622 Bump version to 0.4.0 2024-07-13 10:37:59 +02:00
Ruud van Asseldonk
1f149b60f3 Smith: Generate large integers to stress overflow
The fuzzer was unable to find the overflow case in the new List.sum
method even after several minutes. To help it a bit to find interesting
cases, let's add large integers that are close to overflowing to the
input through a shorcut. Previously they could get there, but only when
reading an 8-byte integer from the input, so first you need a mutation
to bump the number to 8, and then a mutation to have an integer in those
8 bytes that is close to overflowing, and those together are very
unlikely.
2024-07-13 00:12:18 +02:00
Ruud van Asseldonk
a95959146f Add List.sum and Set.sum methods 2024-07-12 23:10:11 +02:00
Ruud van Asseldonk
38e20744d2 Define List.flat_map and Set.flat_map 2024-07-12 21:54:47 +02:00
Ruud van Asseldonk
56e9615b3a Implement List.filter and Map.filter 2024-07-11 21:14:14 +02:00
Ruud van Asseldonk
3b9d76d317 Define List.map and Set.map builtins
When using RCL as a jq replacement, often I have some pipeline and I
want to edit the last part of the query on the command line. I don't
want to have to move my cursor all the way back to wrap the entire
expression in braces. So even though comprehensions can already do map
and filter and flatmap, I still want to add those.

This is step one, adding map.
2024-07-11 19:03:05 +02:00
Ruud van Asseldonk
250a59fc9e Add an html output mode
So I can highlight stuff on my blog as long as there is no highlighting
in Pandoc/Skylighting. With the change to MarkupString, this was really
easy to do!
2024-06-23 20:51:08 +02:00
Ruud van Asseldonk
0993b3199c Bump version to 0.3.0 2024-06-23 11:08:49 +02:00
Ruud van Asseldonk
84523fd33c Save body span info in the loader
This fixes a longstanding issue where reporting errors that we have to
blame on just the document's result in general got blamed on its full
span, which is often a comment and not the offending value. Now we blame
it on the inner body expression, which is more natural.
2024-06-18 20:22:33 +02:00
Ruud van Asseldonk
7f6e1ac71f Add fuzzer to confirm that formatters match
The autoformatter in "rcl format" pretty-prints the CST. But we can
also evaluate the document, and then it gets pretty-printed by the
value pretty printer. These two should match, but currently they do
not. Add a fuzzer to discover all the cases where they don't.
2024-06-15 19:05:26 +02:00
Ruud van Asseldonk
a373e5d27a Implement a --check mode for "rcl format"
I am making the formatter more suitable for real use, and as a result
I want to add a flake/CI check that ensures that all the examples are
formatted correctly. But then I need this --check mode.
2024-06-15 18:43:46 +02:00
Ruud van Asseldonk
80c46e9005 Smith: Merge into_program into constructor
It was not used elsewhere, and mutating the is_minimal in two places
felt ugly.
2024-04-28 19:21:58 +02:00