Commit graph

527 commits

Author SHA1 Message Date
Jussi Saurio
c282c24d94
Merge 'clean up core tester to use conn.execute and conn.exec_rows for parsing correctly the expected values from select queries' from Pedro Muniz
## Description
The PR title. `exec_rows` also does validation of outputs automatically
which is good practice for testing
<!--
Please include a summary of the changes and the related issue.
-->
## Motivation and context
Better typing and don't have to constantly match on `turso_core::Value`
<!--
Please include relevant motivation and context.
Link relevant issues here.
-->
## AI Disclosure
Ai did most of the migration
<!--
Please disclose if any LLM's were used in the creation of this PR and to
what extent,
to help maintainers properly review.
-->

Closes #4192
2025-12-18 09:22:45 +02:00
Jussi Saurio
00d266665b
Merge 'fix coroutine panic: replace ended_coroutine Bitfield with vec' from Jussi Saurio
## Description
Closes #4146
## Motivation and context
panics are bad
## AI Disclosure
none used

Reviewed-by: Pedro Muniz (@pedrocarlo)
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #4177
2025-12-18 09:20:05 +02:00
pedrocarlo
3df4f46d80 minimal repro regression test for starting in MVCC
and later switching to WAL and back to MVCC and then updating and deleting a row that only existed in the BTree
2025-12-17 10:55:25 -03:00
pedrocarlo
8f1dcbf625 add regression test for a delete being lost on switch to wal mode 2025-12-17 10:55:25 -03:00
pedrocarlo
bd4f4d9aa5 add fuzz tests for jounral_mode 2025-12-17 10:55:25 -03:00
pedrocarlo
d73a283136 fix index_scan_compound_key_fuzz, as it always open the same database in both limbo and sqlite. But with the version changes we cannot do that. We need separate databases for each 2025-12-17 10:55:25 -03:00
pedrocarlo
046e6a884d use exec rows for header version test 2025-12-17 10:55:25 -03:00
pedrocarlo
33afc3015c adjust test + remove mv store after transitioning to wal mode 2025-12-17 10:55:25 -03:00
pedrocarlo
e54d3328c0 after checkpoint get header from pager to properly persist change in header 2025-12-17 10:55:25 -03:00
pedrocarlo
257dc5ad09 do not initiate a write transaction for journal mode + checkpoint before changing mode 2025-12-17 10:55:24 -03:00
pedrocarlo
277f9928b7 test changing from WAL to MVCC 2025-12-17 10:55:24 -03:00
pedrocarlo
b948065f22 skip rusqlite integrity check if db is mvcc 2025-12-17 10:55:24 -03:00
pedrocarlo
0cbe904cef ensure DB header is flushed to DB file if header changes during DB open 2025-12-17 10:55:24 -03:00
Pere Diaz Bou
77841042d0
Merge 'Consider Order by expressions collation when deciding candidate index for iteration' from Pedro Muniz
## Description
Does solve #4154, but I don't want to close it with this PR, because it
does not solve the Affinity issue.
We can only use an index to iterate over if the column collation in the
order by clause matches the index collation
<!--
Please include a summary of the changes and the related issue.
-->
## Motivation and context
Fix a bug in the optimizer
<!--
Please include relevant motivation and context.
Link relevant issues here.
-->
## Description of AI Usage
Used AI to write tests, fuzzers, and help me understand the optimizer
code.
Test prompt:
<details>
can you write tests in tcl that test that the correct collation sequence
is properly maintained.
```
CREATE TABLE "t1" ("c1" TEXT COLLATE RTRIM);
INSERT INTO "t1" VALUES (' ');
CREATE INDEX "i1" ON "t1" ("c1" COLLATE RTRIM DESC);
INSERT INTO "t1" VALUES (1025.1655084065987);
SELECT "c1", typeof(c1) FROM "t1" ORDER BY "c1" COLLATE BINARY DESC, rowid ASC; 
```
this is an example of a query that returned incorrect results because of
this
</details>
<!--
Please disclose how AI was used to help create this PR. For example, you
can share prompts,
specific tools, or ways of working that you took advantage of. You can
also share whether the
creation of the PR was mainly driven by AI, or whether it was used for
assistance.
This is a good way of sharing knowledge to other contributors about how
we can work more efficiently with
AI tools. Note that the use of AI is encouraged, but the committer is
still fully responsible for understanding
and reviewing the output.
-->

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #4248
2025-12-17 14:26:25 +01:00
pedrocarlo
398c82fdf1 clippy 2025-12-16 23:11:31 -03:00
pedrocarlo
0be4a885f1 adjust fuzz tests to account for collation and sort order for asserting correctness 2025-12-16 23:09:00 -03:00
pedrocarlo
4c157e8c7a add AI fuzz tests 2025-12-16 15:27:07 -03:00
Pekka Enberg
98308415b4 core: Don't rollback transaction when schema updated
When the SchemaUpdated error occurs during statement execution, don't
roll back the transaction, but instead re-prepare the statement.

Spotted by Whopper.
2025-12-15 13:49:21 +02:00
PThorpe92
12601af1e1
increase lantency check for flaky test in test_read_path.rs 2025-12-12 13:49:56 -05:00
pedrocarlo
60ab032e3a clean up core tester to use conn.execute instead of limbo_exec_rows and use conn.exec_rows for parsing correctly the expected values from select queries 2025-12-12 12:36:48 -03:00
Jussi Saurio
216f4d71ee cargo fmt 2025-12-11 23:37:19 +02:00
Jussi Saurio
6280ab4a7b Regression test for 4146 2025-12-11 23:37:19 +02:00
Jussi Saurio
faa1197e58 Add greedy join ordering for large queries (>12 tables)
Problem:

The existing DP-based join optimizer has O(2^n) complexity, which
causes large joins to basically not get past the planning phase.

Fix:

Add a greedy algorithm that runs in O(n²) time for >12 tables.

Details:

- Add compute_greedy_join_order() with hub score heuristic for
  selecting the starting table. Tables referenced by many other
  tables' constraints are preferred, enabling index lookups on
  subsequent joins. This is especially good for star schema
  queries.
- Add GREEDY_JOIN_THRESHOLD constant (12) for switchover point
- Add fuzz tests covering star schemas, chains, cliques up to 62
  tables, and LEFT JOIN ordering invariants (RHS of a left join
  cannot be reordered).
- Not all the tests necessarily assert that a query results in a
  good plan (apart from star schemas), but all tests do assert
  that we are _able_ to construct a plan (unlike before, where
  even 32-way joins would grind to a halt).

AI usage:

- Pretty much all of this was a conversation between me and Opus 4.5.
  I asked it to search the internet for practical solutions to the
  problem and it suggested a simple greedy search as a low-complexity
  solution and I thought it was a good idea for now.
2025-12-11 09:31:38 +02:00
pedrocarlo
2a449f8f6b revert change in index_scan_compound_key_fuzz 2025-12-10 23:38:41 -03:00
pedrocarlo
bc16588273 index and rowid fuzz should open a separate sqlite database for comparison 2025-12-10 15:38:02 -03:00
pedrocarlo
ffbbd4c270 add exec rows trait for more ergonomic testing in core_tester 2025-12-10 15:21:03 -03:00
pedrocarlo
c207eddd3f remove unused TempDatabase argument requirement for limbo_exec_rows 2025-12-10 15:21:03 -03:00
Jussi Saurio
64dba96c60
Merge 'initialize global header on bootstrap' from Pedro Muniz
On bootstrap just store the header but not flush it to disk. Only try to
flush it when we start an MVCC transaction. Also applied fix in
`OpenDup` where we should not wrap an ephemeral table with an MvCursor

Reviewed-by: Mikaël Francoeur (@LeMikaelF)
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #4151
2025-12-10 19:04:23 +02:00
Jussi Saurio
0d35366f5d
Merge 'Fix CTE scope propagation for compound SELECTs' from Martin Mauch
CTEs now work correctly when combined with UNION, UNION ALL, INTERSECT,
and EXCEPT.
**Before:**
```sql
WITH t AS (SELECT 1 as x) SELECT * FROM t UNION ALL SELECT 2 as x
-- Error: Parse error: no such table: t
```
**After:**
```sql
WITH t AS (SELECT 1 as x) SELECT * FROM t UNION ALL SELECT 2 as x
-- Works correctly, returns rows (1) and (2)
```

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #4123
2025-12-10 19:04:03 +02:00
pedrocarlo
d09b48c0e6 test page1 init 2025-12-10 12:53:25 -03:00
Nikita Sivukhin
8cc40949a5 fix clippy 2025-12-10 15:08:44 +04:00
Nikita Sivukhin
e70428e976 add explicit insert action to the fuzz test and disable it for now 2025-12-10 14:53:32 +04:00
Nikita Sivukhin
70b1e5716d add fuzz test which maintain sqlite3 and turso db and periodically switch them between each other in order to validate compatibility 2025-12-10 14:53:32 +04:00
Nikita Sivukhin
9acf541e28 add compatibility test for multiple-columns unique constraint 2025-12-10 01:46:25 +04:00
Martin Mauch
3dc8fed204 Fix CTE scope propagation for compound SELECTs 2025-12-09 13:09:55 +01:00
Jussi Saurio
2aefb4ee8c
Merge 'fix/btree: disable move_to_rightmost optimization with triggers' from Jussi Saurio
Some checks are pending
Build & publish @tursodatabase/database / db-bindings-x86_64-pc-windows-msvc - node@20 (push) Waiting to run
Build & publish @tursodatabase/database / db-bindings-x86_64-unknown-linux-gnu - node@20 (push) Waiting to run
Build & publish @tursodatabase/database / sync-bindings-aarch64-apple-darwin - node@20 (push) Waiting to run
Build & publish @tursodatabase/database / sync-bindings-aarch64-unknown-linux-gnu - node@20 (push) Waiting to run
Build & publish @tursodatabase/database / sync-bindings-wasm32-wasip1-threads - node@20 (push) Waiting to run
Build & publish @tursodatabase/database / sync-bindings-x86_64-pc-windows-msvc - node@20 (push) Waiting to run
Build & publish @tursodatabase/database / sync-bindings-x86_64-unknown-linux-gnu - node@20 (push) Waiting to run
Build & publish @tursodatabase/database / Test DB bindings on Linux-x64-gnu - node@20 (push) Blocked by required conditions
Build & publish @tursodatabase/database / Test DB bindings on browser@20 (push) Blocked by required conditions
Build & publish @tursodatabase/database / Publish (push) Blocked by required conditions
Python / configure-strategy (push) Waiting to run
Python / test (push) Blocked by required conditions
Python / lint (push) Waiting to run
Python / linux (x86_64) (push) Waiting to run
Python / macos-arm64 (aarch64) (push) Waiting to run
Python / sdist (push) Waiting to run
Python / Release (push) Blocked by required conditions
Rust / cargo-fmt-check (push) Waiting to run
Rust / build-native (blacksmith-4vcpu-ubuntu-2404) (push) Waiting to run
Rust / build-native (macos-latest) (push) Waiting to run
Rust / build-native (windows-latest) (push) Waiting to run
Rust / clippy (push) Waiting to run
Rust / simulator (push) Waiting to run
Rust / test-limbo (push) Waiting to run
Rust / test-sqlite (push) Waiting to run
Rust Benchmarks+Nyrkiö / bench (push) Waiting to run
Rust Benchmarks+Nyrkiö / clickbench (push) Waiting to run
Rust Benchmarks+Nyrkiö / tpc-h-criterion (push) Waiting to run
Rust Benchmarks+Nyrkiö / tpc-h (push) Waiting to run
Rust Benchmarks+Nyrkiö / vfs-bench-compile (push) Waiting to run
## Closes
- Closes #4017
- Addresses #4043; this now fails with `Page cache is full` with 100k
pages, which is a separate non-corruption issue. Modifying max page
cache size to be 10 million pages makes it not finish at all. We should
modify the issue after this is merged to reflect what the new problem
is. The queries in the issue (#4043) create a WAL that is at least 1.7
GB in size
## Background
We have an optimization in the btree where if:
- We want to reach the rightmost leaf page, and
- We know the rightmost page and are already on it
Then we can skip a seek.
## Problem
The problem is this optimization should NEVER be used in cases where we
cannot be sure that the btree wasn't modified from under us e.g. by a
trigger subprogram.
## Fix
Hence, disable it when we are executing a parent program that has
triggers which will fire.
## AI Disclosure
No AI was used for this PR.

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #4135
2025-12-09 10:02:11 +02:00
Jussi Saurio
2e8b771f6f
Merge 'Fix descending index scan returning rows when seek key is NULL' from Jussi Saurio
Closes #4066
Closes #4129
## Problem
Take e.g.
CREATE TABLE t(x); CREATE INDEX txdesc ON t(x desc); INSERT INTO t
values (1),(2),(3);
SELECT * FROM t WHERE x > NULL;
--
Our plan, like Sqlite, was to start iterating the descending index from
the beginning (Rewind) and stop once we hit a row where x is <= than
NULL using `IdxGe` instruction (GE in descending indexes means LE).
However, `IdxGe` and other similar instructions use a sort comparison
where NULL is less than numbers/strings etc, so this would incorrectly
not jump.
## Fix
Fix: we need to emit an explicit NULL check after rewinding.
## Tests
Added TCL tests + improved `index_scan_compound_key_fuzz` to have NULL
seek keys sometimes.
## AI disclosure
I started debugging this with Claude Code thinking this is a much deeper
corruption issue, but Opus 4.5 noticed immediately that we are returning
rows from a `x > NULL` comparison which should never happen. Hence, the
fix was then fairly simple.

Closes #4132
2025-12-09 09:38:18 +02:00
Jussi Saurio
201a7e6387 Regression test for 4017 2025-12-09 09:19:37 +02:00
Nikita Sivukhin
997a07cac9 add test with concurrent commit/rollback and insert stmt 2025-12-08 16:34:07 +04:00
Jussi Saurio
027ebe33fe Fix descending index scan returning rows when seek key is NULL
Take e.g.

CREATE TABLE t(x); CREATE INDEX txdesc ON t(x desc);
INSERT INTO t values (1),(2),(3);

SELECT * FROM t WHERE x > NULL;

--

Our plan, like Sqlite, was to start iterating the descending index
from the beginning (Rewind) and stop once we hit a row where x is
<= than NULL using `IdxGe` instruction (GE in descending indexes
means LE).

However, `IdxGe` and other similar instructions use a sort comparison
where NULL is less than numbers/strings etc, so this would incorrectly
not jump.

Fix: we need to emit an explicit NULL check after rewinding.
2025-12-08 13:19:58 +02:00
Jussi Saurio
826ca4d44d chore: remove experimental_indexes feature flags 2025-12-08 13:00:37 +02:00
Preston Thorpe
c09c30746e
Merge 'guard subjournal access within single connection' from Nikita Sivukhin
Right now turso can panic with various asserts if 2 or more write
statements will be executed over single connection concurrently:
```
thread 'query_processing::test_write_path::api_misuse' panicked at core/storage/pager.rs:776:9:
subjournal offset should be 0
```
This PR adds explicit guard for subjournal access which will return
`Busy` for the operation internally and lead to wait condition for the
statement until subjournal ownership will be released and can be re-
acquired again.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #4110
2025-12-05 13:14:07 -05:00
Preston Thorpe
e7c7f232b4
Merge 'testing/fuzz: Add new fuzzer for joins' from Preston Thorpe
needed for #4063 to merge, currently passing on main but just want to
lower the already huge diff

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #4103
2025-12-05 13:13:44 -05:00
Nikita Sivukhin
659ef7c079 fix clippy 2025-12-05 21:39:35 +04:00
Nikita Sivukhin
487854e6d6 guard subjournal access in order to prevent concurrent operations over it within same connection 2025-12-05 21:25:13 +04:00
PThorpe92
c9a6827011
Extract out join fuzzer to an additional test on indexed columns 2025-12-05 12:03:23 -05:00
Jussi Saurio
a90087bcf6 Enable compound_select_fuzz for mvcc because it works as a regression test for #4108 2025-12-05 17:19:05 +02:00
Nikita Sivukhin
d5f58de801 fix clippy 2025-12-05 15:29:17 +04:00
Nikita Sivukhin
e839eb499b make fuzzer to generate SELECT COUNT(*) OR SELECT * statements
- this is important for IN operation translation bug because in case of COUNT(*) there is constant assignment instruction right after last instruction translated from IN condition
2025-12-05 14:48:59 +04:00
Jussi Saurio
eb782ce2d4 fix/mvcc: seek() must seek from both mv store and btree
for example, upon opening an existing database, all the rows are in
the btree, so if we seek only from MV store, we won't find anything.
ergo: we must look from both the mv store and the btree. if we are
iterating forwards, the smallest of the two results is where we land,
and vice versa for backwards iteration.

initially this implementation used blocking IO but was refactored to
use state machines after the rest of the Cursor methods in the MVCC cursor
module were refactored to do that too.

---

this PR was initially almost entirely written using Claude Code + Opus 4.5,
but heavily manually cleaned up as the AI made the state machine refactor
far too complicated.
2025-12-05 11:53:16 +02:00