Add a script to seed the fuzz corpus from goldens

The fuzzer is having a hard time discovering interesting inputs to key_by. So far it discovered that it can do [true].key_by({}.key_by) which works, because {}.key_by(true) does not fail because the input is empty, so it returns and empty dict, and then the output is {{}: true}. But the fuzzer has not yet discovered that key_by is supposed to receive a lambda, and it's not producing any interesting groupings. But we do have a corpus of goldens that covers these cases. They are not as small as the fuzz inputs, but they might be a good starting point. So let's add a script to seed the fuzz corpus. So far I'm still not impressed, but maybe I am just impatient and I need to let the fuzzer run a bit longer.
2025-10-10 00:42:13 +00:00 · 2023-11-30 23:38:26 +01:00 · 2023-11-30 23:38:26 +01:00 · d068e20d77
commit d068e20d77
parent f2d4c26dee
1 changed files with 64 additions and 0 deletions
--- a/tools/seed_fuzz_corpus.py
+++ b/tools/seed_fuzz_corpus.py
@ -0,0 +1,64 @@
+#!/usr/bin/env python3
+
+# RCL -- A reasonable configuration language.
+# Copyright 2023 Ruud van Asseldonk
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# A copy of the License has been included in the root of the repository.
+
+"""
+Seed the fuzz corpus from the golden tests.
+
+SYNOPSIS
+
+  tools/seed_fuzz_corpus.py
+"""
+
+import os
+
+from hashlib import sha1
+from statistics import quantiles
+from typing import List
+
+
+def seed_one(fname: str) -> int:
+    """
+    Extract the test case and write it into the fuzz directory. Return the
+    length of the input in bytes.
+    """
+    input_lines: List[str] = []
+
+    with open(fname, "r", encoding="utf-8") as f:
+        for line in f:
+            if line == "# output:\n":
+                break
+
+            input_lines.append(line)
+
+    input_bytes = "".join(input_lines).strip().encode("utf-8")
+
+    # Libfuzzer by default names the fuzz inputs after their sha1sum, so we do
+    # that as well.
+    shasum = sha1(input_bytes).hexdigest()
+    with open(f"fuzz/corpus/main/{shasum}", "wb") as f:
+        f.write(input_bytes)
+        print(f"{shasum} {len(input_bytes):4} {fname}")
+
+    return len(input_bytes)
+
+
+def main() -> None:
+    corpus_dir = "fuzz/corpus/main"
+
+    lens: List[int] = []
+    for root, _dirs, files in os.walk("golden"):
+        for fname in files:
+            if fname.endswith(".test"):
+                lens.append(seed_one(os.path.join(root, fname)))
+
+    print("Length q0.25, q0.5, q0.75:", quantiles(lens, n=4))
+
+
+if __name__ == "__main__":
+    main()