limbo

mirror of https://github.com/tursodatabase/limbo.git synced 2025-07-17 01:15:00 +00:00

Author	SHA1	Message	Date
Pekka Enberg	90c1e3fc06	Switch Connection to use Arc instead of Rc Connection needs to be Arc so that bindings can wrap it with `Mutex` for multi-threading.	2025-06-16 10:43:19 +03:00
Levy A.	01a680b69e	feat(fuzz)+fix: add schema fuzz testing and fix some bugs	2025-06-11 14:19:06 -03:00
Levy A.	41cb13aa74	fix: ignore non-constants	2025-06-11 14:18:41 -03:00
Levy A.	15e0cab8d8	refactor+fix: precompute default values from schema	2025-06-11 14:18:39 -03:00
Levy A.	6945c0c09e	fix+refactor: incorrect label placement also added a `cursor_loop` helper on `ProgramBuilder` to avoid making this mistake in the future. this is zero-cost, and will be optimized to the same thing (hopefully).	2025-06-11 14:17:36 -03:00
Anton Harniakou	d802075ea9	Resolve merge conflict: Add columns names to result set for pragma statement output	2025-06-09 10:40:04 +03:00
pedrocarlo	bc563266b3	add instrumentation to more functions for debugging + adjust how cursors are opened	2025-05-30 20:35:50 -03:00
Jussi Saurio	819a6138d0	Merge 'Fix: aggregate regs must be initialized as NULL at the start' from Jussi Saurio Again found when fuzzing nested where clause subqueries: Aggregate registers need to be NULLed at the start because the same registers might be reused on another invocation of a subquery, and if they are not NULLed, the 2nd invocation of the same subquery will have values left over from the first invocation. Reviewed-by: Preston Thorpe (@PThorpe92) Closes #1614	2025-05-30 09:39:37 +03:00
Jussi Saurio	f8257df77b	Fix: aggregate regs must be initialized as NULL at the start	2025-05-29 18:44:53 +03:00
Jussi Saurio	cc405dea7e	Use new TableReferences struct everywhere	2025-05-29 11:44:56 +03:00
Jussi Saurio	592ba41137	Add assertion forbidding duplicate cursor keys	2025-05-29 01:04:45 +03:00
Jussi Saurio	77ce4780d9	Fix ProgramBuilder::cursor_ref not having unique keys Currently we have this: program.alloc_cursor_id(Option<String>, CursorType)` where the String is the table's name or alias ('users' or 'u' in the query). This is problematic because this can happen: `SELECT * FROM t WHERE EXISTS (SELECT * FROM t)` There are two cursors, both with identifier 't'. This causes a bug where the program will use the same cursor for both the main query and the subquery, since they are keyed by 't'. Instead introduce `CursorKey`, which is a combination of: 1. `TableInternalId`, and 2. index name (Option<String> -- in case of index cursors. This should provide key uniqueness for cursors: `SELECT * FROM t WHERE EXISTS (SELECT * FROM t)` here the first 't' will have a different `TableInternalId` than the second `t`, so there is no clash.	2025-05-29 00:59:24 +03:00
pedrocarlo	e3fd1e589e	support using a INSERT SELECT that references the same table in both statements	2025-05-25 19:15:28 -03:00
Jussi Saurio	7c07c09300	Add stable internal_id property to TableReference Currently our "table id"/"table no"/"table idx" references always use the direct index of the `TableReference` in the plan, e.g. in `SelectPlan::table_references`. For example: ```rust Expr::Column { table: 0, column: 3, .. } ``` refers to the 0'th table in the `table_references` list. This is a fragile approach because it assumes the table_references list is stable for the lifetime of the query processing. This has so far been the case, but there exist certain query transformations, e.g. subquery unnesting, that may fold new table references from a subquery (which has its own table ref list) into the table reference list of the parent. If such a transformation is made, then potentially all of the Expr::Column references to tables will become invalid. Consider this example: ```sql -- Assume tables: users(id, age), orders(user_id, amount) -- Get total amount spent per user on orders over $100 SELECT u.id, sub.total FROM users u JOIN (SELECT user_id, SUM(amount) as total FROM orders o WHERE o.amount > 100 GROUP BY o.user_id) sub WHERE u.id = sub.user_id -- Before subquery unnesting: -- Main query table_references: [users, sub] -- u.id refers to table 0, column 0 -- sub.total refers to table 1, column 1 -- -- Subquery table_references: [orders] -- o.user_id refers to table 0, column 0 -- o.amount refers to table 0, column 1 -- -- After unnesting and folding subquery tables into main query, -- the query might look like this: SELECT u.id, SUM(o.amount) as total FROM users u JOIN orders o ON u.id = o.user_id WHERE o.amount > 100 GROUP BY u.id; -- Main query table_references: [users, orders] -- u.id refers to table index 0 (correct) -- o.amount refers to table index 0 (incorrect, should be 1) -- o.user_id refers to table index 0 (incorrect, should be 1) ``` We could ofc traverse every expression in the subquery and rewrite the table indexes to be correct, but if we instead use stable identifiers for each table reference, then all the column references will continue to be correct. Hence, this PR introduces a `TableInternalId` used in `TableReference` as well as `Expr::Column` and `Expr::Rowid` so that this kind of query transformations can happen with less pain.	2025-05-25 20:26:17 +03:00
pedrocarlo	53bf5d5ef5	adjust translate functions to take a program instead of `Option<ProgramBuilder>` + remove any Init emission in traslate functions + use epilogue in all places necessary	2025-05-21 16:41:10 -03:00
pedrocarlo	1c12535d9f	push prologue to top-level translate function	2025-05-21 15:50:43 -03:00
pedrocarlo	3090dd91fa	push translate_ctx creation outside of prologue	2025-05-21 13:06:25 -03:00
pedrocarlo	f5d6d11d16	extract prologue and epilogue to program builder	2025-05-21 12:47:51 -03:00
pedrocarlo	517c7c81cd	refactor to include optional program builder argument	2025-05-21 12:47:51 -03:00
Pekka Enberg	e102cd0be5	Merge 'Add support for DISTINCT aggregate functions' from Jussi Saurio Reviewable commit by commit. CI failures are not related. Adds support for e.g. `select first_name, sum(distinct age), count(distinct age), avg(distinct age) from users group by 1` Implementation details: - Creates an ephemeral index per distinct aggregate, and jumps over the accumulation step if a duplicate is found Closes #1507	2025-05-20 13:58:57 +03:00
pedrocarlo	5b15d6aa32	Get the table correctly from the connection instead of table_references + test to confirm unique constraint	2025-05-19 15:22:55 -03:00
pedrocarlo	a818b6924c	Removed repeated binary expression translation. Adjusted the set_collation to capture additional context of whether it was set by a Collate expression or not. Added some tests to prove those modifications were necessary.	2025-05-19 15:22:14 -03:00
pedrocarlo	d0a63429a6	Naive implementation of collate for queries. Not implemented for column constraints	2025-05-19 15:22:14 -03:00
Jussi Saurio	8d66347729	vdbe: add Insn::Found	2025-05-17 15:33:55 +03:00
Jussi Saurio	fe65d6e991	Merge 'Performance: hoist entire expressions out of hot loops if they are constant' from Jussi Saurio ## Problem: - We have cases where we are evaluating expressions in a hot loop that could only be evaluated once. For example: `CAST('2025-01-01' as DATETIME)` -- the value of this never changes, so we should only run it once. - We have no robust way of doing this right now for entire _expressions_ -- the only existing facility we have is `program.mark_last_insn_constant()`, which has no concept of how many instructions translating a given _expression_ spends, and breaks very easily for this reason. ## Main ideas of this PR: - Add `expr.is_constant()` determining whether the expression is compile-time constant. Tries to be conservative and not deem something compile-time constant if there is no certainty. - Whenever we think a compile-time constant expression is about to be translated into bytecode in `translate_expr()`, start a so called `constant span`, which means a range of instructions that are part of a compile-time constant expression. - At the end of translating the program, all `constant spans` are hoisted outside of any table loops so they only get evaluated once. - The target offsets of any jump instructions (e.g. `Goto`) are moved to the correct place, taking into account all instructions whose offsets were shifted due to moving the compile-time constant expressions around. - An escape hatch wrapper `translate_expr_no_constant_opt()` is added for cases where we should not hoist constants even if we otherwise could. Right now the only example of this is cases where we are reusing the same register(s) in multiple iterations of some kind of loop, e.g. `VALUES(...)` or in the `coalesce()` function implementation. ## Performance effects Here is an example of a modified/simplified TPC-H query where the `CAST()` calls were previously run millions of times in a hot loop, but now they are optimized out of the loop. BYTECODE PLAN BEFORE: ```sql limbo> explain select l_orderkey, 3 as revenue, o_orderdate, o_shippriority from lineitem, orders, customer where c_mktsegment = 'FURNITURE' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < cast('1995-03-29' as datetime) and l_shipdate > cast('1995-03-29' as datetime); addr opcode p1 p2 p3 p4 p5 comment ---- ----------------- ---- ---- ---- ------------- -- ------- 0 Init 0 26 0 0 Start at 26 1 OpenRead 0 10 0 0 table=lineitem, root=10 2 OpenRead 1 9 0 0 table=orders, root=9 3 OpenRead 2 8 0 0 table=customer, root=8 4 Rewind 0 25 0 0 Rewind lineitem 5 Column 0 10 5 0 r[5]=lineitem.l_shipdate 6 String8 0 7 0 1995-03-29 0 r[7]='1995-03-29' 7 Function 0 7 6 cast 0 r[6]=func(r[7..8]) <-- CAST() executed millions of times 8 Le 5 6 24 0 if r[5]<=r[6] goto 24 9 Column 0 0 9 0 r[9]=lineitem.l_orderkey 10 SeekRowid 1 9 24 0 if (r[9]!=orders.rowid) goto 24 11 Column 1 4 10 0 r[10]=orders.o_orderdate 12 String8 0 12 0 1995-03-29 0 r[12]='1995-03-29' 13 Function 0 12 11 cast 0 r[11]=func(r[12..13]) 14 Ge 10 11 24 0 if r[10]>=r[11] goto 24 15 Column 1 1 14 0 r[14]=orders.o_custkey 16 SeekRowid 2 14 24 0 if (r[14]!=customer.rowid) goto 24 17 Column 2 6 15 0 r[15]=customer.c_mktsegment 18 Ne 15 16 24 0 if r[15]!=r[16] goto 24 19 Column 0 0 1 0 r[1]=lineitem.l_orderkey 20 Integer 3 2 0 0 r[2]=3 21 Column 1 4 3 0 r[3]=orders.o_orderdate 22 Column 1 7 4 0 r[4]=orders.o_shippriority 23 ResultRow 1 4 0 0 output=r[1..4] 24 Next 0 5 0 0 25 Halt 0 0 0 0 26 Transaction 0 0 0 0 write=false 27 String8 0 8 0 DATETIME 0 r[8]='DATETIME' 28 String8 0 13 0 DATETIME 0 r[13]='DATETIME' 29 String8 0 16 0 FURNITURE 0 r[16]='FURNITURE' 30 Goto 0 1 0 ``` BYTECODE PLAN AFTER: ```sql limbo> explain select l_orderkey, 3 as revenue, o_orderdate, o_shippriority from lineitem, orders, customer where c_mktsegment = 'FURNITURE' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < cast('1995-03-29' as datetime) and l_shipdate > cast('1995-03-29' as datetime); addr opcode p1 p2 p3 p4 p5 comment ---- ----------------- ---- ---- ---- ------------- -- ------- 0 Init 0 21 0 0 Start at 21 1 OpenRead 0 10 0 0 table=lineitem, root=10 2 OpenRead 1 9 0 0 table=orders, root=9 3 OpenRead 2 8 0 0 table=customer, root=8 4 Rewind 0 20 0 0 Rewind lineitem 5 Column 0 10 5 0 r[5]=lineitem.l_shipdate 6 Le 5 6 19 0 if r[5]<=r[6] goto 19 7 Column 0 0 9 0 r[9]=lineitem.l_orderkey 8 SeekRowid 1 9 19 0 if (r[9]!=orders.rowid) goto 19 9 Column 1 4 10 0 r[10]=orders.o_orderdate 10 Ge 10 11 19 0 if r[10]>=r[11] goto 19 11 Column 1 1 14 0 r[14]=orders.o_custkey 12 SeekRowid 2 14 19 0 if (r[14]!=customer.rowid) goto 19 13 Column 2 6 15 0 r[15]=customer.c_mktsegment 14 Ne 15 16 19 0 if r[15]!=r[16] goto 19 15 Column 0 0 1 0 r[1]=lineitem.l_orderkey 16 Column 1 4 3 0 r[3]=orders.o_orderdate 17 Column 1 7 4 0 r[4]=orders.o_shippriority 18 ResultRow 1 4 0 0 output=r[1..4] 19 Next 0 5 0 0 20 Halt 0 0 0 0 21 Transaction 0 0 0 0 write=false 22 String8 0 7 0 1995-03-29 0 r[7]='1995-03-29' 23 String8 0 8 0 DATETIME 0 r[8]='DATETIME' 24 Function 1 7 6 cast 0 r[6]=func(r[7..8]) <-- CAST() executed twice 25 String8 0 12 0 1995-03-29 0 r[12]='1995-03-29' 26 String8 0 13 0 DATETIME 0 r[13]='DATETIME' 27 Function 1 12 11 cast 0 r[11]=func(r[12..13]) 28 String8 0 16 0 FURNITURE 0 r[16]='FURNITURE' 29 Integer 3 2 0 0 r[2]=3 30 Goto 0 1 0 0 ``` EXECUTION RUNTIME BEFORE: ```sql limbo> select l_orderkey, 3 as revenue, o_orderdate, o_shippriority from lineitem, orders, customer where c_mktsegment = 'FURNITURE' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < cast('1995-03-29' as datetime) and l_shipdate > cast('1995-03-29' as datetime); ┌────────────┬─────────┬─────────────┬────────────────┐ │ l_orderkey │ revenue │ o_orderdate │ o_shippriority │ ├────────────┼─────────┼─────────────┼────────────────┤ └────────────┴─────────┴─────────────┴────────────────┘ Command stats: ---------------------------- total: 3.633396667 s (this includes parsing/coloring of cli app) ``` EXECUTION RUNTIME AFTER: ```sql limbo> select l_orderkey, 3 as revenue, o_orderdate, o_shippriority from lineitem, orders, customer where c_mktsegment = 'FURNITURE' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < cast('1995-03-29' as datetime) and l_shipdate > cast('1995-03-29' as datetime); ┌────────────┬─────────┬─────────────┬────────────────┐ │ l_orderkey │ revenue │ o_orderdate │ o_shippriority │ ├────────────┼─────────┼─────────────┼────────────────┤ └────────────┴─────────┴─────────────┴────────────────┘ Command stats: ---------------------------- total: 2.0923475 s (this includes parsing/coloring of cli app) ```` Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #1359	2025-04-25 16:55:41 +03:00
Jussi Saurio	c3441f9685	vdbe: move comments if instructions were moved around in emit_constant_insns()	2025-04-24 11:05:21 +03:00
Jussi Saurio	0f5c791784	vdbe: refactor label resolution to account for insn offsets changing	2025-04-24 11:05:21 +03:00
Jussi Saurio	b4b38bdb3c	vdbe: resolve labels for InitCoroutine::start_offset	2025-04-24 11:05:21 +03:00
Jussi Saurio	47f3f3bda3	vdbe: replace constant_insns with constant_spans	2025-04-24 11:05:21 +03:00
pedrocarlo	b6036cc79d	Primary key constraint working	2025-04-23 16:44:13 -03:00
Jussi Saurio	09ad6d8f01	vdbe: resolve labels for Insn::Once	2025-04-21 14:59:13 +03:00
Jussi Saurio	1fe1f0ebba	ProgramBuilder: add resolve_cursor_id_safe() which doesn't unwrap	2025-04-15 15:13:39 +03:00
Jussi Saurio	d286a56e15	refactor: fold Async/Await insns into a single instruction	2025-04-14 09:40:20 +03:00
Jussi Saurio	3e42a62cd0	Add SeekLE/SeekLT operations to VDBE	2025-04-09 10:14:29 +03:00
PThorpe92	45a8e5e226	Add close_cursors helper method to program builder	2025-04-05 11:06:18 -04:00
Pere Diaz Bou	7e4b57f2e2	VDBE with direct function dispatch This PR is unapologetically stolen from @vmg's implementation in Vitess implemented here https://github.com/vitessio/vitess/pull/12369. If you want a more in depth explanation of how this works you can read the [blog post he carefully wrote](https://planetscale.com/blog/faster-interpreters-in-go-catching-up-with-cpp). In limbo we have a huge problem with [register spilling](https://en.wikipedia.org/wiki/Register_allocation), this can be easily observed with the prolog of `Program::step` before: ```llvm start: %e.i.i304.i = alloca [0 x i8], align 8 %formatter.i305.i = alloca [64 x i8], align 8 %buf.i306.i = alloca [24 x i8], align 8 %formatter.i259.i = alloca [64 x i8], align 8 ..................... these are repeated for hundreds of lines ..................... %formatter.i52.i = alloca [64 x i8], align 8 %buf.i53.i = alloca [24 x i8], align 8 %formatter.i.i = alloca [64 x i8], align 8 %buf.i.i = alloca [24 x i8], align 8 %_87.i = alloca [48 x i8], align 8 %_82.i = alloca [24 x i8], align 8 %_73.i = alloca [24 x i8], align 8 %_66.i8446 = alloca [24 x i8], align 8 %_57.i = alloca [24 x i8], align 8 %_48.i = alloca [24 x i8], align 8 ``` After these changes we completely remove the need of register spilling (yes that is the complete prolog): ```llvm start: %self1 = alloca [80 x i8], align 8 %pager = alloca [8 x i8], align 8 %mv_store = alloca [8 x i8], align 8 store ptr %0, ptr %mv_store, align 8 store ptr %1, ptr %pager, align 8 %2 = getelementptr inbounds i8, ptr %state, i64 580 %3 = getelementptr inbounds i8, ptr %state, i64 576 %4 = getelementptr inbounds i8, ptr %self, i64 16 %5 = getelementptr inbounds i8, ptr %self, i64 8 %6 = getelementptr inbounds i8, ptr %self1, i64 8 br label %bb1, !dbg !286780 ``` When it comes to branch prediction, we don't really fix a lot because thankfully rust already compiles `match` expressions to a jump table: ```llvm %insn = getelementptr inbounds [0 x %"vdbe::insn::Insn"], ptr %self657, i64 0, i64 %index, !dbg !249527 %332 = load i8, ptr %insn, align 8, !dbg !249528, !range !38291, !noundef !14 switch i8 %332, label %default.unreachable26674 [ i8 0, label %bb111 i8 1, label %bb101 i8 2, label %bb100 i8 3, label %bb110 ... i8 104, label %bb5 i8 105, label %bb16 i8 106, label %bb14 ], !dbg !249530 ``` Some results ---- ``` function dispatch: Execute `SELECT 1`/limbo_execute_select_1 time: [29.498 ns 29.548 ns 29.601 ns] change: [-3.6125% -3.3592% -3.0804%] (p = 0.00 < 0.05) main: Execute `SELECT 1`/limbo_execute_select_1 time: [33.789 ns 33.832 ns 33.878 ns] ```	2025-04-02 14:55:37 +02:00
Pekka Enberg	387b68fc06	Merge 'Expose 'Explain' to prepared statement to allow for alternate Writer ' from Preston Thorpe ### The problem: I often need to copy the output of an `Explain` statement to my clipboard. Currently this is not possible because it currently will only write to stdout. All other limbo output, I am able to run `.output file` in the CLI, then enter my query and in another tmux pane I simply `cat file \| xclip -in -selection clipboard`. ### The solution: Expose a `statement.explain()` method that returns the query explanation as a string. If the user uses something like `execute` instead of prepare, it will default to `stdout` as expected, but this allows the user to access the query plan on the prepared statement and do with it what they please. Closes #1166	2025-03-28 09:55:58 +02:00
PThorpe92	7b55f7a167	Move explain to statement to allow for alternate writer	2025-03-24 18:48:12 -04:00
Ihor Andrianov	d8e070a360	moved json_cache to state	2025-03-24 14:48:40 +02:00
Ihor Andrianov	1511c9b3bf	add json cache to json functions and fix tests	2025-03-24 13:17:58 +02:00
Pere Diaz Bou	00ab3d1c0c	Fix ordering and implement Deref	2025-03-17 10:22:42 +01:00
Pere Diaz Bou	20f5ade95e	Experiment with a custom Lock for database header	2025-03-17 10:21:34 +01:00
Pekka Enberg	b0636e4494	Merge 'Adds Drop Table' from Zaid Humayun This PR adds support for `DROP TABLE` and addresses issue https://github.com/tursodatabase/limbo/issues/894 It depends on https://github.com/tursodatabase/limbo/pull/785 being merged in because it requires the implementation of `free_page`. EDIT: The PR above has been merged. It adds the following: * an implementation for the `DropTable` AST instruction via a method called `translate_drop_table` * a couple of new instructions - `Destroy` and `DropTable`. The former is to modify physical b-tree pages and the latter is to modify in-memory structures like the schema hash table. * `btree_destroy` on `BTreeCursor` to walk the tree of pages for this table and place it in free list. * state machine traversal for both `btree_destroy` and `clear_overflow_pages` to ensure performant, correct code. * unit & tcl tests * modifies the `Null` instruction to follow SQLite semantics and accept a second register. It will set all registers in this range to null. This is required for `DROP TABLE`. The screenshots below have a comparison of the bytecodes generated via SQLite & Limbo. Limbo has the same instruction set except for the subroutines which involve opening an ephemeral table, copying over the triggers from the `sqlite_schema` table and then re-inserting them back into the `sqlite_schema` table. This is because `OpenEphemeral` is still a WIP and is being tracked at https://github.com/tursodatabase/limbo/pull/768 ![Screenshot 2025-02-09 at 7 05 03 PM](https://github.com/user- attachments/assets/1d597001-a60c-4a76-89fd-8b90881c77c9) ![Screenshot 2025-02-09 at 7 05 35 PM](https://github.com/user- attachments/assets/ecfd2a7a-2edc-49cd-a8d1-7b4db8657444) Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #897	2025-03-06 18:27:41 +02:00
Pere Diaz Bou	e4a8ee5402	move load extensions to Connection Extensions are loaded per connection and not per database as per SQLite behaviour. This also helps with removing locks.	2025-03-05 14:07:48 +01:00
Pere Diaz Bou	8daf7666d1	Make database Sync + Send	2025-03-05 14:07:48 +01:00
Zaid Humayun	23a904f38d	Merge branch 'main' of https://github.com/tursodatabase/limbo	2025-03-01 01:18:45 +05:30
Zaid Humayun	fbc8cd7e70	vdbe: modified the Null instruction modified the Null instruction to more closely match SQLite semantics. Allows passing in a second register and all registers from r1..r2 area set to null	2025-02-19 21:46:26 +05:30
PThorpe92	9c8083231c	Implement create virtual table and VUpdate opcode	2025-02-17 20:44:44 -05:00
[B	5214cf9859	Added IdxLE and IdxLT opcodes	2025-02-14 20:22:30 +01:00
Pekka Enberg	34b0c7c09a	core/vdbe: AutoCommit instruction	2025-02-14 10:26:31 +02:00

1 2

99 commits