limbo

mirror of https://github.com/tursodatabase/limbo.git synced 2025-12-23 08:21:09 +00:00

Author	SHA1	Message	Date
Jussi Saurio	74296e52bb	Merge 'Automatically Propagate Encryption options' from Pedro Muniz On database open, we store the Encryption Options and pass them onwards to the Connection, Pager and Wal. We also have slight gain in ergonomics, as we don't have set the Pragma's for the `cipher` and `hexkey` on each new `Connection`. I needed this logic, because I will need to initialize a Default Header for empty DBs and encryption opts not being automatically propagated was hindering me for this. Ai Disclosure Claude helped me debug and find out issues in my implementation cc @avinassh Reviewed-by: Avinash Sajjanshetty (@avinassh) Closes #4100	2025-12-05 15:31:17 +02:00
pedrocarlo	ee73bab743	get correct reserved bytes if Cipher is not None	2025-12-05 02:04:06 -03:00
pedrocarlo	a311c966a2	set encryption context for page and wal in `init_pager`	2025-12-05 02:04:06 -03:00
pedrocarlo	889322f6b5	do not call pragmas related to encryption on connect or open	2025-12-05 02:04:06 -03:00
pedrocarlo	0118a65169	pass encryption opts from the database to the connection on `connect`	2025-12-05 02:04:06 -03:00
pedrocarlo	85b212056d	separate init function for connect	2025-12-05 02:04:06 -03:00
pedrocarlo	1a43de35ce	add encryption key and cipher to Database struct	2025-12-05 02:04:06 -03:00
pedrocarlo	faca85de2f	pass pager to _connect and share initial coon for boostrapping mvcc	2025-12-05 02:04:05 -03:00
Nikita Sivukhin	510a61b5eb	Merge branch 'main' into sync-sdk-kit	2025-12-03 21:16:15 +04:00
pedrocarlo	e26c663616	do not pass mv store if we are in a bootstrap connection	2025-12-03 10:10:02 -03:00
pedrocarlo	02c2a63d8e	use ArcSwap to store MvStore	2025-12-03 10:09:04 -03:00
pedrocarlo	11a40e7e64	do not store MvStore in Statements. Always get them from database	2025-12-03 10:09:04 -03:00
Jussi Saurio	b7d4aa06a5	Merge 'mvcc: implement logical log recovery for indexes + checkpointing of indexes' from Jussi Saurio ## Beef - Change logical log format to also allow index row records - this is for simplicity during recovery and may change later. - Checkpoint indexes and index writes to the DB file - Fix issues related to deletes - it's possible MV store has no row versions for a row that exists in the DB file, so we need to add a tombstone row version in that case, and we must fetch that row's data from the btree to be able to include the data in the row version - fix some miscellaneous logic bugs Closes #4067	2025-12-03 10:08:24 +02:00
Jussi Saurio	265bd74c99	reparse_schema: conditionally read from mv store or not	2025-12-02 11:38:05 +02:00
Nikita Sivukhin	65ec20a562	small renames	2025-12-01 22:53:39 +04:00
Nikita Sivukhin	73a94910d8	Merge branch 'main' into sdk-kit	2025-11-28 02:56:01 +04:00
Nikita Sivukhin	ef3db24a49	rename methods in core a little bit	2025-11-27 14:12:47 +04:00
Jussi Saurio	c433a782b7	mvcc: allow use of indexes yeah they are broken still, but i don't want to add temporary overrides	2025-11-26 09:05:23 +02:00
Jussi Saurio	610d8cc3ba	Merge 'introduce program execution state in order to run stmt to completion in case of finalize or reset' from Nikita Sivukhin Some checks are pending Build & publish @tursodatabase/database / db-bindings-x86_64-pc-windows-msvc - node@20 (push) Waiting to run Details Build & publish @tursodatabase/database / sync-bindings-aarch64-apple-darwin - node@20 (push) Waiting to run Details Build & publish @tursodatabase/database / sync-bindings-aarch64-unknown-linux-gnu - node@20 (push) Waiting to run Details Build & publish @tursodatabase/database / sync-bindings-wasm32-wasip1-threads - node@20 (push) Waiting to run Details Build & publish @tursodatabase/database / sync-bindings-x86_64-pc-windows-msvc - node@20 (push) Waiting to run Details Build & publish @tursodatabase/database / Publish (push) Blocked by required conditions Details Python / test (push) Blocked by required conditions Details Python / sdist (push) Waiting to run Details Python / Release (push) Blocked by required conditions Details Rust / build-native (macos-latest) (push) Waiting to run Details Build & publish @tursodatabase/database / db-bindings-x86_64-unknown-linux-gnu - node@20 (push) Waiting to run Details Build & publish @tursodatabase/database / sync-bindings-x86_64-unknown-linux-gnu - node@20 (push) Waiting to run Details Build & publish @tursodatabase/database / Test DB bindings on Linux-x64-gnu - node@20 (push) Blocked by required conditions Details Build & publish @tursodatabase/database / Test DB bindings on browser@20 (push) Blocked by required conditions Details Python / configure-strategy (push) Waiting to run Details Python / lint (push) Waiting to run Details Python / linux (x86_64) (push) Waiting to run Details Python / macos-arm64 (aarch64) (push) Waiting to run Details Rust / test-sqlite (push) Waiting to run Details Rust / cargo-fmt-check (push) Waiting to run Details Rust / build-native (blacksmith-4vcpu-ubuntu-2404) (push) Waiting to run Details Rust / build-native (windows-latest) (push) Waiting to run Details Rust / clippy (push) Waiting to run Details Rust / simulator (push) Waiting to run Details Rust / test-limbo (push) Waiting to run Details Rust Benchmarks+Nyrkiö / bench (push) Waiting to run Details Rust Benchmarks+Nyrkiö / clickbench (push) Waiting to run Details Rust Benchmarks+Nyrkiö / tpc-h-criterion (push) Waiting to run Details Rust Benchmarks+Nyrkiö / tpc-h (push) Waiting to run Details Rust Benchmarks+Nyrkiö / vfs-bench-compile (push) Waiting to run Details This PR introduces program execution state in order for statement to be aware of its state - is it terminal (Done, Failed, Interrupted) or not. The particular problem right now is that statements like `INSERT INTO t VALUES (1), (2), (3) RETURNING x` will execute inserts one by one and interleave them with rows generation. This means that if statement consumer will just read one row and then finalize the statement - nothing will be actually committed (because transaction will be aborted). In order to quickly mitigate this issue - program state is introduced which can help to decide what to do in the finalize. Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Closes #4038	2025-11-25 14:47:56 +02:00
Nikita Sivukhin	e39e60ef18	introduce program execution state in order to run stmt to completion in case of finalize or reset	2025-11-25 11:14:20 +04:00
Pekka Enberg	94cd61fb69	Merge 'bindings/java: add batching support to JDBC4PreparedStatement' from # Changes Support batching multiple DML queries in a single PreparedStatement. ### Java - the setters of JDBC4PreparedStatement no longer bind to the underlying native statement directly, but only store the parameter values locally - On execution the correct set of parameters is bound to the native statement ### Rust - Added a helper method to retrieve the parameter count of a statement # Reference #615 Reviewed-by: Kim Seon Woo (@seonWKim) Closes #3971	2025-11-23 09:45:08 +02:00
Pekka Enberg	d808db6af9	core: Switch to parking_lot::Mutex It's faster and we eliminate bunch of unwrap() calls.	2025-11-20 10:42:02 +02:00
Duckulus	7e89772326	reset statement instead of recreating it when executing preparedstatement batch	2025-11-19 23:47:15 +01:00
Jussi Saurio	e60e37da7d	triggers: add execution plumbing to translation and vdbe layers	2025-11-18 15:19:01 +02:00
Nikita Sivukhin	be12ca01aa	add is_hole / punch_hole optional methods to IO trait and remove is_hole method from Database trait	2025-11-12 12:04:42 +04:00
PThorpe92	94b6d254a9	Fix comment on vtab_txn_states	2025-11-09 11:08:52 -05:00
PThorpe92	f35ccfba17	Add support for renaming virtual tables	2025-11-09 11:07:42 -05:00
Nikita Sivukhin	da61fa32b4	use dyn DatabaseStorage instead of DatabaseFile	2025-11-06 17:42:03 +04:00
PThorpe92	481d86f567	Optimize and refactor schema::Column type	2025-11-02 20:46:02 -05:00
RS2007	60cbc6d8ea	migrating from_uri to database opts	2025-11-02 16:28:22 +05:30
Pekka Enberg	913b7ac600	core: Disable autovacuum by default People have discovered various bugs in autovacuum so let's disable it by default for now.	2025-11-02 12:09:21 +02:00
Nikita Sivukhin	4c98861590	adjust logs	2025-10-29 16:24:05 +04:00
Pekka Enberg	dae2930743	Merge 'core: Switch to FxHash to improve performance' from Pekka Enberg The default Rust hash map is slow for integer keys. Switch to FxHash instead to reduce executed instructions for, for example, throughput benchmark. Before: ``` penberg@turing:~/src/tursodatabase/turso/perf/throughput/turso$ perf stat ../../../target/release/write-throughput --threads 1 --batch-size 100 --compute 0 -i 10000 Turso,1,100,0,106875.21 Performance counter stats for '../../../target/release/write-throughput --threads 1 --batch-size 100 --compute 0 -i 10000': 2,908.02 msec task-clock # 0.310 CPUs utilized 30,508 context-switches # 10.491 K/sec 261 cpu-migrations # 89.752 /sec 813 page-faults # 279.572 /sec 20,655,313,128 instructions # 1.73 insn per cycle # 0.14 stalled cycles per insn 11,930,088,949 cycles # 4.102 GHz 2,845,040,381 stalled-cycles-frontend # 23.85% frontend cycles idle 3,814,652,892 branches # 1.312 G/sec 54,760,600 branch-misses # 1.44% of all branches 9.372979876 seconds time elapsed 2.276835000 seconds user 0.530135000 seconds sys ``` After: ``` penberg@turing:~/src/tursodatabase/turso/perf/throughput/turso$ perf stat ../../../target/release/write-throughput --threads 1 --batch-size 100 --compute 0 -i 10000 Turso,1,100,0,108663.84 Performance counter stats for '../../../target/release/write-throughput --threads 1 --batch-size 100 --compute 0 -i 10000': 2,838.65 msec task-clock # 0.308 CPUs utilized 30,629 context-switches # 10.790 K/sec 351 cpu-migrations # 123.650 /sec 818 page-faults # 288.165 /sec 19,887,102,451 instructions # 1.72 insn per cycle # 0.14 stalled cycles per insn 11,593,166,024 cycles # 4.084 GHz 2,830,298,617 stalled-cycles-frontend # 24.41% frontend cycles idle 3,764,334,333 branches # 1.326 G/sec 53,157,766 branch-misses # 1.41% of all branches 9.218225731 seconds time elapsed 2.231889000 seconds user 0.508785000 seconds sys ``` Closes #3837	2025-10-28 14:49:09 +02:00
Pekka Enberg	810ed8ad60	Merge 'Don't allow autovacuum to be flipped on non-empty databases' from Pavan Nambi Turso incorrectly creates the first table in an autovacuumed table in page 2. (Note: this is on collaboration with @LeMikaelF) SQLite does not allow enabling or disabling auto-vacuum after the first table has been created (https://sqlite.org/pragma.html#pragma_auto_vacuum). This is because the sequence of the pages in the databases is different when auto-vacuum is enabled, because the first b-tree page must be page 3 instead of 2, to make room for the first [Pointer Map page](https://sqlite.org/fileformat.html#pointer_map_or_ptrmap_pages). But Turso doesn't currently consider this, which can lead to data loss. The simplest way to reproduce this is to create an autovacuumed databases with either `pragma auto_vacuum=full` so that autovacuum runs on each commit, and then create a table with some data. Turso will incorrectly create the new table on page 2. After this, every time a new page is created, either through a page split or because a new table is created, Turso will write a 5-byte pointer in page 2, starting from the top of the page, thereby overwriting existing data. For example, let's start with a clean database and the first bytes of page 2. It starts with `0d`, the discriminator for a leaf page ([source](https://www.sqlite.org/fileformat.html#b_tree_pages)). The next interesting number is the number of cells contained in this page (`01`) at offset 5. ``` $ cargo run -- /tmp/a.db turso> create table t(a); turso> insert into t values ('myvalue'); $ dbtotxt /tmp/a.db \| size 8192 pagesize 4096 filename a.db \| page 1 offset 0 # ...snip... \| page 2 offset 4096 \| 0: 0d 00 00 00 01 0f f5 00 0f f5 00 00 00 00 00 00 ................ \| 4080: 00 00 00 00 00 09 01 02 1b 6d 79 76 61 6c 75 65 .........myvalue \| end a.db ``` Pointer map pages are located every N pages, starting from page 2, and contain a list of 5-byte pointers that represent the parent page of a certain page. So whenever Turso or SQLite needs to add a page, it will overwrite 5 bytes of page 2. This means that for data loss to occur, it is sufficient to add a single page to the database, for example by creating a table. Offset 5 will then be zeroed out: ``` $ cargo run -- /tmp/a.db turso> create table t(a); turso> insert into t values ('myvalue'); turso> pragma auto_vacuum=full; turso> create table tt(a); $ dbtotxt /tmp/a.db \| size 12288 pagesize 4096 filename a.db \| page 1 offset 0 # ...snip... \| page 2 offset 4096 \| 0: 01 00 00 00 00 0f f5 00 0f f5 00 00 00 00 00 00 ................ \| 4080: 00 00 00 00 00 09 01 02 1b 6d 79 76 61 6c 75 65 .........myvalue ``` Creating more tables, or adding more B-tree pages, will keep overwriting the rest of the page, until the cells themselves are also overwritten. ## Reproducing the issue in the simulator We have been unable to reproduce this exact corruption mode in the simulator, but patching it shows many failure modes, all of which don't occur with the unpatched simulator. The following seeds are failing. The following seeds are showing the issue when the patched simulator is ran against `main`: - `11522841279124073062`, with "Assertion 'table inquisitive_graham_159 should contain all of its expected values' failed: table inquisitive_graham_159 does not contain the expected values, the simulator model has more rows than the database" - `7057400018220918989`, `16028085350691325843`, `7721542713659053944`, and `203017821863546118`, with "Failed to read ptrmap key=XXX" - `12533694709304969540`, `18357088553315413457`, `3108945730906932377`, with "Integrity Check Failed: Cell N in page 2 is out of range." - `4757352625344646473`, with "dirty pages should be empty for read txn" - `7083498604824302257`, with "header_size: 6272, header_len_bytes: 2, payload.len(): 13" - `17881876827470741581`, with "ParseError("no such table: focused_historians_416")" - `2092231500503735693`, with "range end index 4789 out of range for slice of length 4096" - `7555257419378470845`, with malformed database schema (imaginative_ontivero\u{1})" - `12905270229511147245`, with "index out of bounds: the len is 4096 but the index is 4096" ## Fixing the issue - When DB is opened, we read the `auto_vacuum` state, instead of assuming `auto_vacuum=none`. - Don't allow auto_vacuum to be flipped on non-empty databases as if we allow this it could cause overlap with existing bits.(ptrmap could overwrite existing data) - Modify integrity check to avoid reporting that page 2 is orphaned in auto-vacuumed databases. Fixes #3752 Closes #3830	2025-10-28 14:48:35 +02:00
Jussi Saurio	d993ac8157	Merge 'index_method: implement basic trait and simple toy index' from Nikita Sivukhin This PR adds `index_method` trait and implementation of toy sparse vector index. In order to make PR more lightweight - for now index methods are not deeply integrated into the query planner and only necessary components are added in order to make integration tests which uses `index_method` API directly to work. Primary changes introduced in this PR are: 1. `SymbolTable` extended with `index_methods` field and builtin extensions populated with 2 native indices: `backing_btree` and `toy_vector_sparse_ivf` 2. `Index` struct extended with `index_method` field which holds `IndexMethodAttachment` constructed for the table with given parameters from `IndexMethod` "factory" trait The toy index implementation store inverted index pairs `(dimension, rowid)` in the auxilary BTree index. This index uses special `backing_btree` index_method which marked as `backing_btree: true` and treated in a special way by the db core: this is real BTree index which is not managed by the tursodb core and must be managed by index_method created it (so it responsible for data population, creation, destruction of this btree). Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com> Closes #3846	2025-10-28 07:01:36 +02:00
Jussi Saurio	9c87b20cb2	Merge 'Where clause subquery support' from Jussi Saurio Closes #1282 # Support for WHERE clause subqueries This PR implements support for subqueries that appear in the WHERE clause of SELECT statements. ## What are those lol 1. EXISTS subqueries: `WHERE EXISTS (SELECT ...)` 2. Row value subqueries: `WHERE x = (SELECT ...)` or `WHERE (x, y) = (SELECT ...)`. The latter are not yet supported - only the single-column ("scalar subquery") case is. 3. IN subqueries: `WHERE x IN (SELECT ...)` or `WHERE (x, y) IN (SELECT ...)` ## Correlated vs Uncorrelated Subqueries - Uncorrelated subqueries reference only their own tables and can be evaluated once. - Correlated subqueries reference columns from the outer query (e.g., `WHERE EXISTS (SELECT * FROM t2 WHERE t2.id = t1.id)`) and must be re-evaluated for each row of the outer query ## Implementation ### Planning During query planning, the WHERE clause is walked to find subquery expressions (`Expr::Exists`, `Expr::Subquery`, `Expr::InSelect`). Each subquery is: 1. Assigned a unique internal ID 2. Compiled into its own `SelectPlan` with outer query tables provided as available references 3. Replaced in the AST with an `Expr::SubqueryResult` node that references the subquery with its internal ID 4. Stored in a `Vec<NonFromClauseSubquery>` on the `SelectPlan` For IN subqueries, an ephemeral index is created to store the subquery results; for other kinds, the results are stored in register(s). ### Translation Before emitting bytecode, we need to determine when each subquery should be evaluated: - Uncorrelated: Evaluated once before opening any table cursors - Correlated: Evaluated at the appropriate nested loop depth after all referenced outer tables are in scope This is calculated by examining which outer query tables the subquery references and finding the right-most (innermost) loop that opens those tables - using similar mechanisms that we use for figuring out when to evaluate other `WhereTerm`s too. ### Code Generation - EXISTS: Sets a register to 1 if any row is produced, 0 otherwise. Has new `QueryDestination::ExistsSubqueryResult` variant. - IN: Results stored in an ephemeral index and the index is probed. - RowValue: Results stored in a range of registers. Has new `QueryDestination::RowValueSubqueryResult` variant. ## Annoying details ### Which cursor to read from in a subquery? Sometimes a query will use a covering index, i.e. skip opening the table cursor at all if the index contains All The Needed Stuff. Correlated subqueries reading columns from outer tables is a bit problematic in this regard: with our current translation code, the subquery doesn't know whether the outer query opened a table cursor, index cursor, or both. So, for now, we try to find a table cursor first, then fall back to finding any index cursor for that table. Reviewed-by: Preston Thorpe <preston@turso.tech> Closes #3847	2025-10-28 06:36:55 +02:00
Nikita Sivukhin	bdbfac20fb	resolve index method parameters	2025-10-27 16:39:22 +04:00
Nikita Sivukhin	97dcc0869e	register index_methods as db builtin extensions	2025-10-27 16:31:31 +04:00
Jussi Saurio	de81af29e5	find_table_by_internal_id() returns whether table is an outer query reference Unfortunately, our current translation machinery is unable to know for sure whether a subquery reference to an outer table 't1' has opened a table cursor, an index cursor, or both. For this reason, return a flag from `TableReferences::find_table_by_internal_id()` that tells the caller whether the table is an outer query reference, and further commits will have some additional logic to decide which cursor a subquery will read from when referencing a table from the outer query.	2025-10-27 13:47:49 +02:00
Nikita Sivukhin	8a80e8b743	rename custom modules to index_method like in postgresql	2025-10-27 13:18:18 +04:00
Nikita Sivukhin	299533b7b6	hide custom modules syntax behind --experimental-custom-modules flag	2025-10-27 12:29:05 +04:00
Nikita Sivukhin	f178daa373	update comment	2025-10-27 11:47:25 +04:00
Nikita Sivukhin	906bbdd1c4	support deep nestedness	2025-10-27 11:37:42 +04:00
Pekka Enberg	dfab8c44bc	core: Switch to FxHash to improve performance The default Rust hash map is slow for integer keys. Switch to FxHash instead to reduce executed instructions for, for example, throughput benchmark. Note that dirty page tracking is changed to BTreeMap to ensure that the hash function changes don't impact the WAL frame order, which SQLite guarantees to be page number ordered. Before: ``` penberg@turing:~/src/tursodatabase/turso/perf/throughput/turso$ perf stat ../../../target/release/write-throughput --threads 1 --batch-size 100 --compute 0 -i 10000 Turso,1,100,0,106875.21 Performance counter stats for '../../../target/release/write-throughput --threads 1 --batch-size 100 --compute 0 -i 10000': 2,908.02 msec task-clock # 0.310 CPUs utilized 30,508 context-switches # 10.491 K/sec 261 cpu-migrations # 89.752 /sec 813 page-faults # 279.572 /sec 20,655,313,128 instructions # 1.73 insn per cycle # 0.14 stalled cycles per insn 11,930,088,949 cycles # 4.102 GHz 2,845,040,381 stalled-cycles-frontend # 23.85% frontend cycles idle 3,814,652,892 branches # 1.312 G/sec 54,760,600 branch-misses # 1.44% of all branches 9.372979876 seconds time elapsed 2.276835000 seconds user 0.530135000 seconds sys ``` After: ``` penberg@turing:~/src/tursodatabase/turso/perf/throughput/turso$ perf stat ../../../target/release/write-throughput --threads 1 --batch-size 100 --compute 0 -i 10000 Turso,1,100,0,108663.84 Performance counter stats for '../../../target/release/write-throughput --threads 1 --batch-size 100 --compute 0 -i 10000': 2,838.65 msec task-clock # 0.308 CPUs utilized 30,629 context-switches # 10.790 K/sec 351 cpu-migrations # 123.650 /sec 818 page-faults # 288.165 /sec 19,887,102,451 instructions # 1.72 insn per cycle # 0.14 stalled cycles per insn 11,593,166,024 cycles # 4.084 GHz 2,830,298,617 stalled-cycles-frontend # 24.41% frontend cycles idle 3,764,334,333 branches # 1.326 G/sec 53,157,766 branch-misses # 1.41% of all branches 9.218225731 seconds time elapsed 2.231889000 seconds user 0.508785000 seconds sys ```	2025-10-26 16:48:59 +02:00
Pavan-Nambi	8d0ae362da	Merge branch 'main' of github.com:tursodatabase/turso into avcm	2025-10-24 18:58:30 +05:30
Pekka Enberg	c3fb867173	core: Switch RwLock<Arc<Pager>> to ArcSwap<Pager> We don't actually need the RwLock locking capabilities, just the ability to swap the instance.	2025-10-24 14:10:08 +03:00
PThorpe92	a8b257c664	Replace several RwLock<Enum> values with new AtomicEnums	2025-10-22 09:35:26 -04:00
Pavan-Nambi	1a058a1531	get autovacuum mode from db header on existing dbs if autovaccum on, look for ptrmap pages	2025-10-18 18:47:30 +05:30
Pekka Enberg	bf5de920f2	core: Unsafe Send and Sync pushdown This patch pushes unsafe Send and Sync to individual components instead of doing it at Database level. This makes it easier for us to incrementally fix thread-safety, but avoid developers adding more thread unsafe code.	2025-10-16 11:26:50 +03:00
pedrocarlo	23380a58d7	make next truly async and non blocking	2025-10-14 12:33:36 -03:00

1 2 3 4 5 ...

751 commits