This PR makes few improvements to the partial sync performance.
1. `segment_size` configuration is exposed and set to 128kb by default -
this setting makes sync engine to load pages in batches (even when only
one page is requested). This reduces amount of requests and also makes
requests more friendly to the server, because internally server group
pages in segments and independent requests to the pages within same
segment introduce unnecessary overhead (for server it simpler to return
just the batch of pages stored in one segment)
2. `speculative_load` configuration is exposed which makes sync engine
to speculatively load children pages for the accessed page. This will
also reduce amount of requests and makes tables to lazy-load faster
(because all internal nodes will be loaded quickly with fewer requests)
Closes#4144
## Closes
- Closes#4017
- Addresses #4043; this now fails with `Page cache is full` with 100k
pages, which is a separate non-corruption issue. Modifying max page
cache size to be 10 million pages makes it not finish at all. We should
modify the issue after this is merged to reflect what the new problem
is. The queries in the issue (#4043) create a WAL that is at least 1.7
GB in size
## Background
We have an optimization in the btree where if:
- We want to reach the rightmost leaf page, and
- We know the rightmost page and are already on it
Then we can skip a seek.
## Problem
The problem is this optimization should NEVER be used in cases where we
cannot be sure that the btree wasn't modified from under us e.g. by a
trigger subprogram.
## Fix
Hence, disable it when we are executing a parent program that has
triggers which will fire.
## AI Disclosure
No AI was used for this PR.
Reviewed-by: Preston Thorpe <preston@turso.tech>
Closes#4135
The idea is to have a custom `all-mvcc.test` so we can add `.test` files
that we expect to work with MVCC. In cases where files are not enough we
have `is_turso_mvcc` to check if we want to run a test.
For example we skip partial index tests like this:
```
if {![is_turso_mvcc]} {
do_execsql_test_on_specific_db {:memory:} autoinc-conflict-on-nothing {
CREATE TABLE t (id INTEGER PRIMARY KEY AUTOINCREMENT, k TEXT);
CREATE UNIQUE INDEX idx_k_partial ON t(k) WHERE id > 1;
INSERT INTO t (k) VALUES ('a');
INSERT INTO t (k) VALUES ('a');
INSERT INTO t (k) VALUES ('a') ON CONFLICT DO NOTHING;
INSERT INTO t (k) VALUES ('b');
SELECT * FROM t ORDER BY id;
} {1|a 2|a 4|b}
}
```
`test-mvcc-compat` is not run under CI for now as we need to fix every
test anyways so no point in making every PR fail for now.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#4139
fixes `autoinc-fail-on-max-rowid` for now but we still not support
random row id since we need to add locking to regular `OP NewRowId`
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#4138
Closes#4066Closes#4129
## Problem
Take e.g.
CREATE TABLE t(x); CREATE INDEX txdesc ON t(x desc); INSERT INTO t
values (1),(2),(3);
SELECT * FROM t WHERE x > NULL;
--
Our plan, like Sqlite, was to start iterating the descending index from
the beginning (Rewind) and stop once we hit a row where x is <= than
NULL using `IdxGe` instruction (GE in descending indexes means LE).
However, `IdxGe` and other similar instructions use a sort comparison
where NULL is less than numbers/strings etc, so this would incorrectly
not jump.
## Fix
Fix: we need to emit an explicit NULL check after rewinding.
## Tests
Added TCL tests + improved `index_scan_compound_key_fuzz` to have NULL
seek keys sometimes.
## AI disclosure
I started debugging this with Claude Code thinking this is a much deeper
corruption issue, but Opus 4.5 noticed immediately that we are returning
rows from a `x > NULL` comparison which should never happen. Hence, the
fix was then fairly simple.
Closes#4132
This PR prevents concurrent execution of transaction control statements
(COMMIT/ROLLBACK) and write operations.
With this guard machinery, this will lead to the assertion failure.
This is logical follow up after
[#4110](https://github.com/tursodatabase/turso/pull/4110) PR
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#4134
We have an optimization in the btree where if:
- We want to reach the rightmost leaf page, and
- We know the rightmost page and are already on it
Then we can skip a seek.
The problem is this optimization should NEVER be used in cases where we cannot be sure that the btree wasn't modified from under us e.g. by a trigger subprogram.
Hence, disable it when we are executing a parent program that has triggers which
will fire.
No AI was used for this PR.
Indexes have been on by default, let's remove the code noise created by
the feature flag in various places. Entire PR generated by Claude Code +
Opus 4.5.
Closes#4131
Take e.g.
CREATE TABLE t(x); CREATE INDEX txdesc ON t(x desc);
INSERT INTO t values (1),(2),(3);
SELECT * FROM t WHERE x > NULL;
--
Our plan, like Sqlite, was to start iterating the descending index
from the beginning (Rewind) and stop once we hit a row where x is
<= than NULL using `IdxGe` instruction (GE in descending indexes
means LE).
However, `IdxGe` and other similar instructions use a sort comparison
where NULL is less than numbers/strings etc, so this would incorrectly
not jump.
Fix: we need to emit an explicit NULL check after rewinding.
When the sorter's internal read buffer was full and a prefetch read was
issued, the read would request 0 bytes. This obviously returned 0 bytes read,
and was incorrectly interpreted as EOF, causing the sorter to think the chunk
was exhausted when most data remained unread.
This resulted in "Corrupt database: Incomplete record" errors.
Fix:
- Avoid issuing pointless 0 byte reads
- Distinguish between "no room in buffer" and "nothing left to read",
and only the latter causes EOF
When reading from sorted chunk files during external merge sort, if a chunk needed
async IO to read more data, the chunk index was lost after yielding. This caused
rows to be permanently dropped from the result set.
The bug was in next_from_chunk_heap(): pending completions were drained before
confirming IO was complete, so when resuming after yield, we had no record of which
chunks needed to be retried.
Fix: Track the chunk index alongside the pending completions, and call push_to_chunk_heap()
when the IO completes to ensure the rows get put on the heap properly.
In my adventure of doing #3536, I had a bug where an opcode was using a
stale MvStore reference when transitioning from MvStore to Wal mode. I
did not want to special case just 1 opcode to ignore the `mv_store`
argument (in my particular case it was `Insn::Checkpoint`), so I just
edited it and forced the opcodes to get `mv_store` from the
`program.connection.mv_store()` which is always up to date.
But, if we don't want to make this change I can always just do the
special case and move on.
**AI Disclosure:**
Asked Claude to do most of refactoring, but all the changes were
manually approved by me.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#4126
After this fix, I ran the fuzz test for more than an hour with no
issues.
Closes https://github.com/tursodatabase/turso/issues/4075
## AI disclosure
Claude wrote the implementation and tests from just a copy/paste of the
Github issue.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#4119
## AI disclosure
The draft of the description of this issue was written by Claude Code,
and it also came up with the fix with just this instruction:
>this fails, fix it
>
>cargo run -p limbo_sim -- --seed 12569318052915776741 --disable-bugbase
--memory-io --minimum-tests 100 --maximum-tests 1000 loop -n 10 --short-
circuit
>
>careful, the output is long. To see the error, you only need the last
few dozen lines.
I nudged it once, because it incorrectly said wrote that the only cause
of a `SQLITE_BUSY` would be a concurrent checkpoint, and then I did some
minor cleanup.
I then started Claude Code in SQLite, to check if `SQLITE_BUSY` really
was ignored during auto checkpoints, and it pointed me to [this
line](https://github.com/sqlite/sqlite/blob/master/src/main.c#L2470),
where the returned result code is indeed ignored.
## Description
The issue was that when auto-checkpoint failed due to a Busy error, the
entire commit would fail. However, at the point when checkpoint is
called during commit, the WAL frames have already been written
successfully, so the commit should proceed without error even if the
checkpoint cannot run. This matches SQLite's behaviour, where auto-
checkpointing uses the PASSIVE checkpoint mode (source), which ignores
`SQLITE_BUSY` errors:
> On the other hand, passive mode might leave the checkpoint unfinished
if there are concurrent readers or writers.
([source](https://sqlite.org/c3ref/wal_checkpoint_v2.html))
The fix (in core/storage/pager.rs): When self.checkpoint() returns
LimboError::Busy during the CommitState::Checkpoint phase of
commit_dirty_pages_inner, we now:
1. Log a debug message that auto-checkpoint was skipped
2. Set the result to PagerCommitResult::WalWritten (indicating WAL was
written but checkpoint didn't happen)
3. Continue to the Done state to complete the commit successfully
Closes https://github.com/tursodatabase/turso/issues/4059
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#4118
For an empty database, SQLite has the database header in memory on
startup. We will need to also have the Database Header on startup to
check if the DB is in WAL or MVCC mode. So this is necessary for #3536.
The major changes here is that we don't have a `DbState` variable
anymore, and instead with have an `init_page_1` that holds an in-memory
page1 that is flushed to disk on a `WriteTx`. In my opinion, this change
also reduced the complexity of our initialization process
Closes#3441
Reviewed-by: Mikaël Francoeur (@LeMikaelF)
Reviewed-by: Preston Thorpe <preston@turso.tech>
Closes#4092
SQLite rejects `CREATE INDEX idx ON t(rowid)` with "no such column:
rowid" because rowid is a pseudo-column, not an actual column. Limbo was
incorrectly allowing this.
The fix removes the special exception for ROWID_STRS (rowid, _rowid_,
oid) in validate_index_expression(). Now these identifiers are only
allowed if they match an actual column name in the table (i.e., when
shadowed).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Closes https://github.com/tursodatabase/turso/issues/4011
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#4121
SQLite rejects `CREATE INDEX idx ON t(rowid)` with "no such column: rowid"
because rowid is a pseudo-column, not an actual column. Limbo was
incorrectly allowing this.
The fix removes the special exception for ROWID_STRS (rowid, _rowid_, oid)
in validate_index_expression(). Now these identifiers are only allowed
if they match an actual column name in the table (i.e., when shadowed).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Right now turso can panic with various asserts if 2 or more write
statements will be executed over single connection concurrently:
```
thread 'query_processing::test_write_path::api_misuse' panicked at core/storage/pager.rs:776:9:
subjournal offset should be 0
```
This PR adds explicit guard for subjournal access which will return
`Busy` for the operation internally and lead to wait condition for the
statement until subjournal ownership will be released and can be re-
acquired again.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#4110