## Description
Does solve #4154, but I don't want to close it with this PR, because it
does not solve the Affinity issue.
We can only use an index to iterate over if the column collation in the
order by clause matches the index collation
<!--
Please include a summary of the changes and the related issue.
-->
## Motivation and context
Fix a bug in the optimizer
<!--
Please include relevant motivation and context.
Link relevant issues here.
-->
## Description of AI Usage
Used AI to write tests, fuzzers, and help me understand the optimizer
code.
Test prompt:
<details>
can you write tests in tcl that test that the correct collation sequence
is properly maintained.
```
CREATE TABLE "t1" ("c1" TEXT COLLATE RTRIM);
INSERT INTO "t1" VALUES (' ');
CREATE INDEX "i1" ON "t1" ("c1" COLLATE RTRIM DESC);
INSERT INTO "t1" VALUES (1025.1655084065987);
SELECT "c1", typeof(c1) FROM "t1" ORDER BY "c1" COLLATE BINARY DESC, rowid ASC;
```
this is an example of a query that returned incorrect results because of
this
</details>
<!--
Please disclose how AI was used to help create this PR. For example, you
can share prompts,
specific tools, or ways of working that you took advantage of. You can
also share whether the
creation of the PR was mainly driven by AI, or whether it was used for
assistance.
This is a good way of sharing knowledge to other contributors about how
we can work more efficiently with
AI tools. Note that the use of AI is encouraged, but the committer is
still fully responsible for understanding
and reviewing the output.
-->
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#4248
When we're not running under Antithesis, allow the user to specify a
seed for random number generation, which impacts the SQL operations we
do. Although not deterministic, this makes reproducing some issues
easier.
Also, add a "scripts/run-until-fail.sh", which you can use to discover
interesting seeds. For example, you can run
```
./scripts/run-until-fail.sh cargo run -p turso_stress -- -t1
```
to find a bug and then just copy-paste the reported seed to attempt to
reproduce it.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#4227
When we're not running under Antithesis, allow the user to specify a
seed for random number generation, which impacts the SQL operations we
do. Although not deterministic, this makes reproducing some issues
easier.
Also, add a "scripts/run-until-fail.sh", which you can use to discover
interesting seeds. For example, you can run
```
./scripts/run-until-fail.sh cargo run -p turso_stress -- -t1
```
to find a bug and then just copy-paste the reported seed to attempt to
reproduce it.
- added procedural macro that creates Rust tests and with just a flag,
creates a new test that runs the same with MVCC enabled
- migrated almost all tests to use this new macro and added the mvcc
flag to the tests that were not failing
- added a `TempDatabase` builder to facilitate the proc_macro to
generate the correct database options
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#3991
1. BTreeCursors were initialized with negative table id - not sure
how this worked before.
2. modification_during_scan test was also working incorrectly as
far as I can tell
Depends on #3775 - to remove noise from this PR.
## Motivation
In my continued efforts in making the simulator more accessible and
simpler to work with, I have over time simplified and optimized some
parts of the codebase like query generation and decision making so that
more people from the community can contribute and enhance the simulator.
This PR is one more step in that direction.
Before this PR, our `InteractionPlan` stored `Vec<Interactions>`.
`Interactions` are a higher level collection that will generate a list
of `Interaction` (yes I know the naming can be slightly confusing
sometimes. Maybe we can change it later as well. Especially because
`Interactions` are mainly just `Property`). However, this architecture
imposed a problem when MVCC enters the picture. MVCC requires us to make
sure that DDL statements are executed serially. To avoid adding even
more complexity to plan generation, I opted on previous PRs to check
before emitting an `Interaction` for execution, if the interaction is a
DDL statement, and if it is, I emit a `Commit` for each connection still
in a transaction. This worked slightly fine, but as we do not store the
actual execution of interactions in the interaction plan, only the
higher level `Interactions`, this meant that I had to do some
workarounds to modify the `Interactions` inside the plan to persist the
`Commit` I generated on demand.
## Problem
However, I was stupid and overlooked the fact that for certain
properties that allow queries to be generated in the middle (referenced
as extensional queries in the code), we cannot specify the connection
that should execute that query, meaning if a DDL statement occurred
there, the simulator could emit the query but could not save it properly
in the plan to reproduce in shrinking. So to correct and make
interaction generation/emission less brittle, I refactored the
`InteractionPlan` so that it stores `Vec<Interaction>` instead.
## Implications
- `Interaction` is not currently serializable using `Serde` due to the
fact that it stores a function in `Assertion`. This means that we cannot
serialize the plan into a `plan.json`. Which to me is honestly fine, as
the only things that used `plan.json` was `--load` and `--watch`
options. Which are options almost nobody really used.
- For load, instead of generating the whole plan it just read the plan
from disk. The workaround for that right now is just load the `cli_opts`
that were last run for that particular seed and use those exact options
to run the simulation.
- For watch, currently there is not workaround but, @alpaylan told me
has some plans to make assertions serializable by embedding a custom
language into the `plan.sql` file, meaning we will probably not need a
json file at all to store the interaction plan. And this embedded
language will make it much easier to bring back a more proper watch
mode.
- The current shrinking algorithms all have some notion of properties
and removal of properties, but `Interaction` do not have this concept.
So I added some metadata to interactions and a origin ID to each
`Interaction` so that we can search through the list of interactions
using binary search to get all of the interactions that are part of the
same `Property`. To support this, I added an `InteractionBuilder` and
some utilities to iterate and remove properties in the `InteractionPlan`
## Conclusion
Overall, this code simplifies emission of interactions and ensures the
`InteractionPlan` always stores the actual interactions that get
executed. This also decouples more query generation logic from query
emission logic.
Closes#3774
Modify `generation/property.rs` to use the Builder
- add additional metadata to `Interaction` to give more context for
shrinking and iterating over interactions that originated from the
same interaction.
- add Iterator like utilities for `InteractionPlan` to facilitate
iterating over interactions that came from the same property: