mirrors/rcl - Forgejo: Beyond coding. We Forge.

mirror of https://github.com/ruuda/rcl.git synced 2025-12-23 04:47:19 +00:00

Author	SHA1	Message	Date
Ruud van Asseldonk	c51b77a59e	Add 'rcl re' and 'rcl rq' shorthands for -fraw This is somewhat common, especially when used as jq replacement, so let's add a shorthand for them.	2024-08-23 22:14:41 +02:00
Ruud van Asseldonk	05ceea94c0	Use new if-else syntax in Smith fuzzer It means the fuzzer gets to explore less, actually, but we still have the source-based fuzzer that will find the case where the colon is missing, and which could hunt for non-idempotencies in the formatter and such.	2024-07-31 21:53:58 +02:00
Ruud van Asseldonk	28d920e4ac	Bump version to 0.5.0	2024-07-28 21:43:29 +02:00
Ruud van Asseldonk	12ffe62cf4	Address a few minor issues in the new build code Caught in self-review, and I don't feel like turning them into fixups for all of the commits that introduced these.	2024-07-27 23:03:05 +02:00
Ruud van Asseldonk	0831fda447	Add --dry-run option to "rcl build" As expected, the golden tests fail to run under Nix because the test directory is not writable. And it's better to not write in my opinion, let's not hack that and have a dry run output mode. For now the output format is not structured, this is good enough for the thests. It could be nice to do structured output in RCL format, but we can do that later if needed.	2024-07-27 23:03:05 +02:00
Ruud van Asseldonk	4aaab52879	Add CLI option to specify --banner in output For the 'build' subcommand, I am adding a banner setting, so for consistency, let's have it for all evaluation commands.	2024-07-13 12:57:34 +02:00
Ruud van Asseldonk	2967267622	Bump version to 0.4.0	2024-07-13 10:37:59 +02:00
Ruud van Asseldonk	1f149b60f3	Smith: Generate large integers to stress overflow The fuzzer was unable to find the overflow case in the new List.sum method even after several minutes. To help it a bit to find interesting cases, let's add large integers that are close to overflowing to the input through a shorcut. Previously they could get there, but only when reading an 8-byte integer from the input, so first you need a mutation to bump the number to 8, and then a mutation to have an integer in those 8 bytes that is close to overflowing, and those together are very unlikely.	2024-07-13 00:12:18 +02:00
Ruud van Asseldonk	a95959146f	Add List.sum and Set.sum methods	2024-07-12 23:10:11 +02:00
Ruud van Asseldonk	38e20744d2	Define List.flat_map and Set.flat_map	2024-07-12 21:54:47 +02:00
Ruud van Asseldonk	56e9615b3a	Implement List.filter and Map.filter	2024-07-11 21:14:14 +02:00
Ruud van Asseldonk	3b9d76d317	Define List.map and Set.map builtins When using RCL as a jq replacement, often I have some pipeline and I want to edit the last part of the query on the command line. I don't want to have to move my cursor all the way back to wrap the entire expression in braces. So even though comprehensions can already do map and filter and flatmap, I still want to add those. This is step one, adding map.	2024-07-11 19:03:05 +02:00
Ruud van Asseldonk	250a59fc9e	Add an html output mode So I can highlight stuff on my blog as long as there is no highlighting in Pandoc/Skylighting. With the change to MarkupString, this was really easy to do!	2024-06-23 20:51:08 +02:00
Ruud van Asseldonk	0993b3199c	Bump version to 0.3.0	2024-06-23 11:08:49 +02:00
Ruud van Asseldonk	84523fd33c	Save body span info in the loader This fixes a longstanding issue where reporting errors that we have to blame on just the document's result in general got blamed on its full span, which is often a comment and not the offending value. Now we blame it on the inner body expression, which is more natural.	2024-06-18 20:22:33 +02:00
Ruud van Asseldonk	7f6e1ac71f	Add fuzzer to confirm that formatters match The autoformatter in "rcl format" pretty-prints the CST. But we can also evaluate the document, and then it gets pretty-printed by the value pretty printer. These two should match, but currently they do not. Add a fuzzer to discover all the cases where they don't.	2024-06-15 19:05:26 +02:00
Ruud van Asseldonk	a373e5d27a	Implement a --check mode for "rcl format" I am making the formatter more suitable for real use, and as a result I want to add a flake/CI check that ensures that all the examples are formatted correctly. But then I need this --check mode.	2024-06-15 18:43:46 +02:00
Ruud van Asseldonk	80c46e9005	Smith: Merge into_program into constructor It was not used elsewhere, and mutating the is_minimal in two places felt ugly.	2024-04-28 19:21:58 +02:00
Ruud van Asseldonk	5fedbe0661	Smith: Fix a few doc comments, simplify match I'm doing one final review pass over this before I merge it, cleaning up some small details.	2024-04-28 19:15:10 +02:00
Ruud van Asseldonk	4b7c1d0c50	Smith: Add short encoding for PushDecimal This ensures we can have decimals expressions with a short encoding, so which should make them more attractive to explore.	2024-04-28 00:12:18 +02:00
Ruud van Asseldonk	b4113fbb9b	Smith: Allow limited arbitrary text input It does help the fuzzer to explore things, but I don't want it to rely on it too much.	2024-04-27 14:45:50 +02:00
Ruud van Asseldonk	d9f6a3639f	Smith: Remove identifier stack I'm not sure if this is positive because it also enables more room for syntax error when we expect something other than an identifier. But maybe it simplifies Smith more than it harms and it speeds up the fuzzer? I have one running for a few minutes now that is catching up to one that is using the classic one, and it looks like it's catching up very fast, but of course that is always the case at the start, so I'm not sure. Maybe I should let it run over night and see which one gets the furthest, coverage-wise? Update: I let both of them run over night. The fuzzer without the identifier stack started later (I don't remember how much, but maybe half an hour, maybe an hour). Yet, it managed to achieve more coverage (7805 vs. 7727, yesterday one with identifier stack was around 7500 and the non-identifier at ~7200). So it looks like getting rid of the stack is more helpful in speeding up the fuzzer, than harmful in generating nonsense. Let's keep it.	2024-04-27 14:25:05 +02:00
Ruud van Asseldonk	4334e8e499	Smith: Implement format string instruction Now it can discover format strings with holes, and also """-quoted strings.	2024-04-27 14:11:03 +02:00
Ruud van Asseldonk	dea708c863	Smith: Remove defaults on some underflows On the one hand, this makes fewer inputs valid Smith programs, which will make it harder for the fuzzer to find them. But on the other hand, if these cases where the default helped were unlikely to lead to interesting programs anyway, so maybe it's good to only unlock them when the stack is properly filled, so that interesting interactions are easier to discover in mutations? It's all very hard to say, and very hard to measure without wasting a huge amount of compute. But at least it makes the code simpler, so let's go for that. After this change, the fuzzer took a few minutes (~6 or so?) to return to the previous level of coverage.	2024-04-27 14:11:03 +02:00
Ruud van Asseldonk	d21e618db8	Smith: Simplify execution, early out more Previously I handled the stack underflows by hand in most places, but we can use the ? operator for that. It does change the behavior slightly, so now my corpus has to converge again, but I think it's worth the simplification. One thing that I'm less sure about: when there is an underflow or other invalid condition, early out, don't use the program. Then the fuzzer will have a harder time to discover valid Smith programs, and mutations will be less effective. But it will also waste less time on different Smith programs that evaluate to the same RCL program. The exec/s counter is now definitely a lot higher than before, almost a factor 2 from quick eyeballing. Which does make sense if most iterations are getting rejected before we even run RCL, so that by itself doesn't say much.	2024-04-27 14:11:03 +02:00
Ruud van Asseldonk	5308232ad9	Smith: Only keep samples that are minimal I'm not sure if this is a good idea. On the one hand it keeps the corpus small and full of interesting examples. But on the other hand, sometimes we need multiple mutations to discover a new program. So I'm a bit worried that this will get stuck in a local minimum. But maybe it will also waste less CPU time on exploring pointless variations of a program?	2024-04-27 14:11:03 +02:00
Ruud van Asseldonk	db2650a7a2	Smith: Add index instruction I implemented coverage, and one thing that stood out is that indexing operations were not covered, so add them! It's discovering new code paths quickly now.	2024-04-27 14:11:03 +02:00
Ruud van Asseldonk	b4581d6e4f	Smith: Add subcommand to evaluate corpus I'll use this to measure coverage.	2024-04-27 14:11:03 +02:00
Ruud van Asseldonk	09faaf2104	Smith: Fuzz only programs that don't waste the stack I saw many corpus entries that are smith programs where the output is empty, but nonetheless the programs differ. Exploring that space is wasteful. I wonder if an early exit, before we even run RCL, helps to focus compute power more on exploring the interesting cases. I don't have any hard measurements, but from me eyeballing the logs, it seems to discover NEW and REDUCE cases at a slightly higher rate than without this, even though the exec/s is not that much higher. Or is it? It seems to go up over the runtime, both in the past and the current fuzzer. Hmmmm.	2024-04-27 14:11:03 +02:00
Ruud van Asseldonk	8bda4c34d0	Smith: Implement a rotate mutation Instead of swap, probably it makes more sense to rotate the instructions by 1. Or by a random amount. Maybe this is too hectic, and I should limit it to just +/- 1? But let's see. My corpus is already pretty much converged so it's hard to tell if this is helpful right now, but I like the idea so I'm adding it. This is where it would be useful to be able to see the mutation name, not just "Custom" in libFuzzer's output. It _has_ to be possible. But how?	2024-04-27 14:11:03 +02:00
Ruud van Asseldonk	c029d38e96	Smith: Remove pop instructions The stack language is already built to consume the things it can consume, and for identifiers, we don't want to consume them, but we can index into the stack when we use one, so we don't always have to have the one to use at the top. So these pops only pollute the space of interesting programs with ones that pointlessly add things to the stack and remove them again. Remove the pop instructions entirely then. It doesn't even impact fuzzer coverage much, which suggests they were not used so much in the corpus. Which also makes sense, libFuzzer's REDUCE would have found smaller programs that achieve the same thing without the pop.	2024-04-27 14:11:03 +02:00
Ruud van Asseldonk	04c0a39417	Smith: Generate short idents with a short instruction I noticed the fuzzer was always using "std" and other builtins as variable names, never using its own even though there is an instruction for it. I suspect it's because of the REDUCE step in libFuzzer, using a builtin saves one byte on the fuzz input size. So add an equally short encoding for short identifiers, that is also lexicographically smaller than using a builtin, so this should now be preferred.	2024-04-27 14:11:03 +02:00
Ruud van Asseldonk	865f446f12	Smith: Document why we have a custom mutator	2024-04-27 14:11:03 +02:00
Ruud van Asseldonk	d70b9faf0d	Smith: Synthesize function types This was one part that was still missing, which I suspect is why it was failing to discover one uncovered line in the TypeDiff reporting that I was hunting for. After a few minutes of running this it sitll hasn't found it, but it is making steady progress and discovering lots of new branches.	2024-04-27 14:11:03 +02:00
Ruud van Asseldonk	0e03b700e3	Smith: Ensure space after unop With keyword unops, we need a space after them. I saw some inputs generating "nottrue". That's not very helpful of course. Not all unops need the space, but it's never harmful.	2024-04-27 14:11:03 +02:00
Ruud van Asseldonk	aeed1c4f70	Remove tinyrand dependency I replaced it with the `random` module, see also the previous commit. This has very little impact on build times, if any. Benchmark 1: cargo build --all (current commit) Time (mean ± σ): 22.219 s ± 0.249 s [User: 212.705 s, System: 19.562 s] Range (min … max): 21.915 s … 22.512 s 5 runs Benchmark 2: cargo build --all (2 commits ago) Time (mean ± σ): 22.365 s ± 0.527 s [User: 215.451 s, System: 19.730 s] Range (min … max): 21.448 s … 22.777 s 5 runs Benchmark was run in powersave mode to avoid thermal throttling.	2024-04-27 14:11:03 +02:00
Ruud van Asseldonk	7d2e1aea9a	Add custom WyRand implementation It's not that there is something wrong with tinyrand, but it turns out, WyRand is such a tiny function, I can implement it myself and avoid a dependency. Do keep tinyrand for one more commit so I can write a test to use it as a reference implementation. It passes.	2024-04-27 14:11:03 +02:00
Ruud van Asseldonk	d3684b4f70	Smith: Fix off-by-one in custom mutator	2024-04-27 14:11:03 +02:00
Ruud van Asseldonk	7e51365354	Smith: Remove duplicated comment, add docs, fix typo	2024-04-27 14:11:03 +02:00
Ruud van Asseldonk	10c0dde82f	Give all fuzzer binaries a fuzz_ prefix This sets them apart from the non-fuzzers, especially in the target directory, but also between the smithctl utility and the fuzzers.	2024-04-27 14:11:03 +02:00
Ruud van Asseldonk	abeea79553	Smith: Use the same fuzz modes as fuzz_source	2024-04-27 14:11:03 +02:00
Ruud van Asseldonk	0add600279	Rename main fuzzer to "fuzz_source" So now I have the two primary ones, "fuzz_source" and "fuzz_smith". Now I just need to make something to feed the outputs of the smith into the fuzz_source corpus.	2024-04-27 14:11:03 +02:00
Ruud van Asseldonk	ff501c2a09	Remove historical fuzz modes from corpus I migrated my corpus locally by running the expand tool followed by a cmin.	2024-04-27 14:11:03 +02:00
Ruud van Asseldonk	b73540fea5	Extract über/main fuzzer into a module I want to reuse the same checks from the smith fuzzer. Maybe I should rename them "source" and "smith" to clarify. The code remains mostly unchanged, only moved, but I did also remove a few of the modes. It's not so clear to me any more that it is an advantage to have a separate typechecking mode when we already have the evaluation mode. So we can delete some of the code.	2024-04-27 14:11:03 +02:00
Ruud van Asseldonk	658d03a120	Smith: Implement the remaining opcodes The fuzzer is going brrrrrrr now, coverage is going up rapidly.	2024-04-27 14:11:03 +02:00
Ruud van Asseldonk	50026295d3	Smith: Add upstream mutator into the mix It seems to help slightly, but it's hard to tell objectively because by now my corpus is getting saturated anyway. It does now print the name of the mutator in the mutation string, I wonder how it does that, there is no API for it. Does it internally append something to a global? I want to mutate that global too, would be useful to see what strategies are effective so I can bias towards them.	2024-04-27 14:11:03 +02:00
Ruud van Asseldonk	8128eee415	Replace nanorand with tinyrand Nanorand contains an off-by-one in its WyRand implementation which made the mutator crash on hitting a code path that should be unreachable. [1] There is an unmerged patch for it on GitHub, but this is the second time I encounter an issue with this library, it also suffered from a biased shuffle when I was looking for a way to shuffle playlists in Musium [2]. To be on the safe side, instead of trying to make Cargo use the patched fork, just swap out the library for one that works. [1]: https://github.com slash Absolucy slash nanorand-rs slash issues slash 45 [2]: https://github.com slash Absolucy slash nanorand-rs slash issues slash 37 (URLs defused to prevent GitHub from auto-linking them, it's not my goal to post passive-aggressive messages in somebody else's issue tracker.)	2024-04-27 14:10:08 +02:00
Ruud van Asseldonk	96c117b388	Smith: Write a custom mutator Hopefully this will be even more effective at discovering interesting smith programs, and thereby interesting RCL programs.	2024-04-22 23:56:26 +02:00
Ruud van Asseldonk	a82a7e3581	Smith: Tweak the instruction semantics a bit From a quick glance at the input, this seems to reduce the number of silly programs a bit. Especially not reading so many arguments for a for loop that is invalid anyway.	2024-04-22 23:56:26 +02:00
Ruud van Asseldonk	bc7379407a	Smith: Print read events for debug So I can read what identifiers it's putting on the stack.	2024-04-22 23:56:26 +02:00

1 2 3

143 commits