Commit graph

99 commits

Author SHA1 Message Date
Pekka Enberg
90c1e3fc06 Switch Connection to use Arc instead of Rc
Connection needs to be Arc so that bindings can wrap it with `Mutex` for
multi-threading.
2025-06-16 10:43:19 +03:00
Levy A.
01a680b69e feat(fuzz)+fix: add schema fuzz testing and fix some bugs 2025-06-11 14:19:06 -03:00
Levy A.
41cb13aa74 fix: ignore non-constants 2025-06-11 14:18:41 -03:00
Levy A.
15e0cab8d8 refactor+fix: precompute default values from schema 2025-06-11 14:18:39 -03:00
Levy A.
6945c0c09e fix+refactor: incorrect label placement
also added a `cursor_loop` helper on `ProgramBuilder` to avoid making
this mistake in the future. this is zero-cost, and will be optimized to
the same thing (hopefully).
2025-06-11 14:17:36 -03:00
Anton Harniakou
d802075ea9 Resolve merge conflict: Add columns names to result set for pragma statement output 2025-06-09 10:40:04 +03:00
pedrocarlo
bc563266b3 add instrumentation to more functions for debugging + adjust how cursors are opened 2025-05-30 20:35:50 -03:00
Jussi Saurio
819a6138d0 Merge 'Fix: aggregate regs must be initialized as NULL at the start' from Jussi Saurio
Again found when fuzzing nested where clause subqueries:
Aggregate registers need to be NULLed at the start because the same
registers might be reused on another invocation of a subquery, and if
they are not NULLed, the 2nd invocation of the same subquery will have
values left over from the first invocation.

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #1614
2025-05-30 09:39:37 +03:00
Jussi Saurio
f8257df77b Fix: aggregate regs must be initialized as NULL at the start 2025-05-29 18:44:53 +03:00
Jussi Saurio
cc405dea7e Use new TableReferences struct everywhere 2025-05-29 11:44:56 +03:00
Jussi Saurio
592ba41137 Add assertion forbidding duplicate cursor keys 2025-05-29 01:04:45 +03:00
Jussi Saurio
77ce4780d9 Fix ProgramBuilder::cursor_ref not having unique keys
Currently we have this:

program.alloc_cursor_id(Option<String>, CursorType)`

where the String is the table's name or alias ('users' or 'u' in
the query).

This is problematic because this can happen:

`SELECT * FROM t WHERE EXISTS (SELECT * FROM t)`

There are two cursors, both with identifier 't'. This causes a bug
where the program will use the same cursor for both the main query
and the subquery, since they are keyed by 't'.

Instead introduce `CursorKey`, which is a combination of:

1. `TableInternalId`, and
2. index name (Option<String> -- in case of index cursors.

This should provide key uniqueness for cursors:

`SELECT * FROM t WHERE EXISTS (SELECT * FROM t)`

here the first 't' will have a different `TableInternalId` than the
second `t`, so there is no clash.
2025-05-29 00:59:24 +03:00
pedrocarlo
e3fd1e589e support using a INSERT SELECT that references the same table in both statements 2025-05-25 19:15:28 -03:00
Jussi Saurio
7c07c09300 Add stable internal_id property to TableReference
Currently our "table id"/"table no"/"table idx" references always
use the direct index of the `TableReference` in the plan, e.g. in
`SelectPlan::table_references`. For example:

```rust
Expr::Column { table: 0, column: 3, .. }
```

refers to the 0'th table in the `table_references` list.

This is a fragile approach because it assumes the table_references
list is stable for the lifetime of the query processing. This has so
far been the case, but there exist certain query transformations,
e.g. subquery unnesting, that may fold new table references from
a subquery (which has its own table ref list) into the table reference
list of the parent.

If such a transformation is made, then potentially all of the Expr::Column
references to tables will become invalid. Consider this example:

```sql
-- Assume tables: users(id, age), orders(user_id, amount)

-- Get total amount spent per user on orders over $100
SELECT u.id, sub.total
FROM users u JOIN
     (SELECT user_id, SUM(amount) as total
      FROM orders o
      WHERE o.amount > 100
      GROUP BY o.user_id) sub
WHERE u.id = sub.user_id

-- Before subquery unnesting:
-- Main query table_references: [users, sub]
-- u.id refers to table 0, column 0
-- sub.total refers to table 1, column 1
--
-- Subquery table_references: [orders]
-- o.user_id refers to table 0, column 0
-- o.amount refers to table 0, column 1
--
-- After unnesting and folding subquery tables into main query,
-- the query might look like this:

SELECT u.id, SUM(o.amount) as total
FROM users u JOIN orders o ON u.id = o.user_id
WHERE o.amount > 100
GROUP BY u.id;

-- Main query table_references: [users, orders]
-- u.id refers to table index 0 (correct)
-- o.amount refers to table index 0 (incorrect, should be 1)
-- o.user_id refers to table index 0 (incorrect, should be 1)
```

We could ofc traverse every expression in the subquery and rewrite
the table indexes to be correct, but if we instead use stable identifiers
for each table reference, then all the column references will continue
to be correct.

Hence, this PR introduces a `TableInternalId` used in `TableReference`
as well as `Expr::Column` and `Expr::Rowid` so that this kind of query
transformations can happen with less pain.
2025-05-25 20:26:17 +03:00
pedrocarlo
53bf5d5ef5 adjust translate functions to take a program instead of Option<ProgramBuilder> + remove any Init emission in traslate functions + use epilogue in all places necessary 2025-05-21 16:41:10 -03:00
pedrocarlo
1c12535d9f push prologue to top-level translate function 2025-05-21 15:50:43 -03:00
pedrocarlo
3090dd91fa push translate_ctx creation outside of prologue 2025-05-21 13:06:25 -03:00
pedrocarlo
f5d6d11d16 extract prologue and epilogue to program builder 2025-05-21 12:47:51 -03:00
pedrocarlo
517c7c81cd refactor to include optional program builder argument 2025-05-21 12:47:51 -03:00
Pekka Enberg
e102cd0be5 Merge 'Add support for DISTINCT aggregate functions' from Jussi Saurio
Reviewable commit by commit. CI failures are not related.
Adds support for e.g. `select first_name, sum(distinct age),
count(distinct age), avg(distinct age) from users group by 1`
Implementation details:
- Creates an ephemeral index per distinct aggregate, and jumps over the
accumulation step if a duplicate is found

Closes #1507
2025-05-20 13:58:57 +03:00
pedrocarlo
5b15d6aa32 Get the table correctly from the connection instead of table_references + test to confirm unique constraint 2025-05-19 15:22:55 -03:00
pedrocarlo
a818b6924c Removed repeated binary expression translation. Adjusted the set_collation to capture additional context of whether it was set by a Collate expression or not. Added some tests to prove those modifications were necessary. 2025-05-19 15:22:14 -03:00
pedrocarlo
d0a63429a6 Naive implementation of collate for queries. Not implemented for column constraints 2025-05-19 15:22:14 -03:00
Jussi Saurio
8d66347729 vdbe: add Insn::Found 2025-05-17 15:33:55 +03:00
Jussi Saurio
fe65d6e991 Merge 'Performance: hoist entire expressions out of hot loops if they are constant' from Jussi Saurio
## Problem:
- We have cases where we are evaluating expressions in a hot loop that
could only be evaluated once. For example: `CAST('2025-01-01' as
DATETIME)` -- the value of this never changes, so we should only run it
once.
- We have no robust way of doing this right now for entire _expressions_
-- the only existing facility we have is
`program.mark_last_insn_constant()`, which has no concept of how many
instructions translating a given _expression_ spends, and breaks very
easily for this reason.
## Main ideas of this PR:
- Add `expr.is_constant()` determining whether the expression is
compile-time constant. Tries to be conservative and not deem something
compile-time constant if there is no certainty.
- Whenever we think a compile-time constant expression is about to be
translated into bytecode in `translate_expr()`, start a so called
`constant span`, which means a range of instructions that are part of a
compile-time constant expression.
- At the end of translating the program, all `constant spans` are
hoisted outside of any table loops so they only get evaluated once.
- The target offsets of any jump instructions (e.g. `Goto`) are moved to
the correct place, taking into account all instructions whose offsets
were shifted due to moving the compile-time constant expressions around.
- An escape hatch wrapper `translate_expr_no_constant_opt()` is added
for cases where we should not hoist constants even if we otherwise
could. Right now the only example of this is cases where we are reusing
the same register(s) in multiple iterations of some kind of loop, e.g.
`VALUES(...)` or in the `coalesce()` function implementation.
## Performance effects
Here is an example of a modified/simplified TPC-H query where the
`CAST()` calls were previously run millions of times in a hot loop, but
now they are optimized out of the loop.
**BYTECODE PLAN BEFORE:**
```sql
limbo> explain select
        l_orderkey,
        3 as revenue,
        o_orderdate,
        o_shippriority
from
        lineitem,
        orders,
        customer
where
        c_mktsegment = 'FURNITURE'
        and c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate < cast('1995-03-29' as datetime)
        and l_shipdate > cast('1995-03-29' as datetime);
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     26    0                    0   Start at 26
1     OpenRead           0     10    0                    0   table=lineitem, root=10
2     OpenRead           1     9     0                    0   table=orders, root=9
3     OpenRead           2     8     0                    0   table=customer, root=8
4     Rewind             0     25    0                    0   Rewind lineitem
5       Column           0     10    5                    0   r[5]=lineitem.l_shipdate
6       String8          0     7     0     1995-03-29     0   r[7]='1995-03-29'
7       Function         0     7     6     cast           0   r[6]=func(r[7..8])  <-- CAST() executed millions of times
8       Le               5     6     24                   0   if r[5]<=r[6] goto 24
9       Column           0     0     9                    0   r[9]=lineitem.l_orderkey
10      SeekRowid        1     9     24                   0   if (r[9]!=orders.rowid) goto 24
11      Column           1     4     10                   0   r[10]=orders.o_orderdate
12      String8          0     12    0     1995-03-29     0   r[12]='1995-03-29'
13      Function         0     12    11    cast           0   r[11]=func(r[12..13])
14      Ge               10    11    24                   0   if r[10]>=r[11] goto 24
15      Column           1     1     14                   0   r[14]=orders.o_custkey
16      SeekRowid        2     14    24                   0   if (r[14]!=customer.rowid) goto 24
17      Column           2     6     15                   0   r[15]=customer.c_mktsegment
18      Ne               15    16    24                   0   if r[15]!=r[16] goto 24
19      Column           0     0     1                    0   r[1]=lineitem.l_orderkey
20      Integer          3     2     0                    0   r[2]=3
21      Column           1     4     3                    0   r[3]=orders.o_orderdate
22      Column           1     7     4                    0   r[4]=orders.o_shippriority
23      ResultRow        1     4     0                    0   output=r[1..4]
24    Next               0     5     0                    0
25    Halt               0     0     0                    0
26    Transaction        0     0     0                    0   write=false
27    String8            0     8     0     DATETIME       0   r[8]='DATETIME'
28    String8            0     13    0     DATETIME       0   r[13]='DATETIME'
29    String8            0     16    0     FURNITURE      0   r[16]='FURNITURE'
30    Goto               0     1     0
```
**BYTECODE PLAN AFTER**:
```sql
limbo> explain select
        l_orderkey,
        3 as revenue,
        o_orderdate,
        o_shippriority
from
        lineitem,
        orders,
        customer
where
        c_mktsegment = 'FURNITURE'
        and c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate < cast('1995-03-29' as datetime)
        and l_shipdate > cast('1995-03-29' as datetime);
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     21    0                    0   Start at 21
1     OpenRead           0     10    0                    0   table=lineitem, root=10
2     OpenRead           1     9     0                    0   table=orders, root=9
3     OpenRead           2     8     0                    0   table=customer, root=8
4     Rewind             0     20    0                    0   Rewind lineitem
5       Column           0     10    5                    0   r[5]=lineitem.l_shipdate
6       Le               5     6     19                   0   if r[5]<=r[6] goto 19
7       Column           0     0     9                    0   r[9]=lineitem.l_orderkey
8       SeekRowid        1     9     19                   0   if (r[9]!=orders.rowid) goto 19
9       Column           1     4     10                   0   r[10]=orders.o_orderdate
10      Ge               10    11    19                   0   if r[10]>=r[11] goto 19
11      Column           1     1     14                   0   r[14]=orders.o_custkey
12      SeekRowid        2     14    19                   0   if (r[14]!=customer.rowid) goto 19
13      Column           2     6     15                   0   r[15]=customer.c_mktsegment
14      Ne               15    16    19                   0   if r[15]!=r[16] goto 19
15      Column           0     0     1                    0   r[1]=lineitem.l_orderkey
16      Column           1     4     3                    0   r[3]=orders.o_orderdate
17      Column           1     7     4                    0   r[4]=orders.o_shippriority
18      ResultRow        1     4     0                    0   output=r[1..4]
19    Next               0     5     0                    0
20    Halt               0     0     0                    0
21    Transaction        0     0     0                    0   write=false
22    String8            0     7     0     1995-03-29     0   r[7]='1995-03-29'
23    String8            0     8     0     DATETIME       0   r[8]='DATETIME'
24    Function           1     7     6     cast           0   r[6]=func(r[7..8]) <-- CAST() executed twice
25    String8            0     12    0     1995-03-29     0   r[12]='1995-03-29'
26    String8            0     13    0     DATETIME       0   r[13]='DATETIME'
27    Function           1     12    11    cast           0   r[11]=func(r[12..13])
28    String8            0     16    0     FURNITURE      0   r[16]='FURNITURE'
29    Integer            3     2     0                    0   r[2]=3
30    Goto               0     1     0                    0
```
**EXECUTION RUNTIME BEFORE:**
```sql
limbo> select
        l_orderkey,
        3 as revenue,
        o_orderdate,
        o_shippriority
from
        lineitem,
        orders,
        customer
where
        c_mktsegment = 'FURNITURE'
        and c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate < cast('1995-03-29' as datetime)
        and l_shipdate > cast('1995-03-29' as datetime);
┌────────────┬─────────┬─────────────┬────────────────┐
│ l_orderkey │ revenue │ o_orderdate │ o_shippriority │
├────────────┼─────────┼─────────────┼────────────────┤
└────────────┴─────────┴─────────────┴────────────────┘
Command stats:
----------------------------
total: 3.633396667 s (this includes parsing/coloring of cli app)
```
**EXECUTION RUNTIME AFTER:**
```sql
limbo> select
        l_orderkey,
        3 as revenue,
        o_orderdate,
        o_shippriority
from
        lineitem,
        orders,
        customer
where
        c_mktsegment = 'FURNITURE'
        and c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate < cast('1995-03-29' as datetime)
        and l_shipdate > cast('1995-03-29' as datetime);
┌────────────┬─────────┬─────────────┬────────────────┐
│ l_orderkey │ revenue │ o_orderdate │ o_shippriority │
├────────────┼─────────┼─────────────┼────────────────┤
└────────────┴─────────┴─────────────┴────────────────┘
Command stats:
----------------------------
total: 2.0923475 s (this includes parsing/coloring of cli app)
````

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1359
2025-04-25 16:55:41 +03:00
Jussi Saurio
c3441f9685 vdbe: move comments if instructions were moved around in emit_constant_insns() 2025-04-24 11:05:21 +03:00
Jussi Saurio
0f5c791784 vdbe: refactor label resolution to account for insn offsets changing 2025-04-24 11:05:21 +03:00
Jussi Saurio
b4b38bdb3c vdbe: resolve labels for InitCoroutine::start_offset 2025-04-24 11:05:21 +03:00
Jussi Saurio
47f3f3bda3 vdbe: replace constant_insns with constant_spans 2025-04-24 11:05:21 +03:00
pedrocarlo
b6036cc79d Primary key constraint working 2025-04-23 16:44:13 -03:00
Jussi Saurio
09ad6d8f01 vdbe: resolve labels for Insn::Once 2025-04-21 14:59:13 +03:00
Jussi Saurio
1fe1f0ebba ProgramBuilder: add resolve_cursor_id_safe() which doesn't unwrap 2025-04-15 15:13:39 +03:00
Jussi Saurio
d286a56e15 refactor: fold Async/Await insns into a single instruction 2025-04-14 09:40:20 +03:00
Jussi Saurio
3e42a62cd0 Add SeekLE/SeekLT operations to VDBE 2025-04-09 10:14:29 +03:00
PThorpe92
45a8e5e226
Add close_cursors helper method to program builder 2025-04-05 11:06:18 -04:00
Pere Diaz Bou
7e4b57f2e2 VDBE with direct function dispatch
This PR is unapologetically stolen from @vmg's implementation in Vitess
implemented here https://github.com/vitessio/vitess/pull/12369. If you
want a more in depth explanation of how this works you can read the
[blog post he carefully
wrote](https://planetscale.com/blog/faster-interpreters-in-go-catching-up-with-cpp).

In limbo we have a huge problem with [register
spilling](https://en.wikipedia.org/wiki/Register_allocation), this can
be easily observed with the prolog of `Program::step` before:
```llvm
start:
    %e.i.i304.i = alloca [0 x i8], align 8
    %formatter.i305.i = alloca [64 x i8], align 8
    %buf.i306.i = alloca [24 x i8], align 8
    %formatter.i259.i = alloca [64 x i8], align 8
    ..................... these are repeated for hundreds of lines
.....................
    %formatter.i52.i = alloca [64 x i8], align 8
    %buf.i53.i = alloca [24 x i8], align 8
    %formatter.i.i = alloca [64 x i8], align 8
    %buf.i.i = alloca [24 x i8], align 8
    %_87.i = alloca [48 x i8], align 8
    %_82.i = alloca [24 x i8], align 8
    %_73.i = alloca [24 x i8], align 8
    %_66.i8446 = alloca [24 x i8], align 8
    %_57.i = alloca [24 x i8], align 8
    %_48.i = alloca [24 x i8], align 8
```

After these changes we completely remove the need of register spilling
(yes that is the complete prolog):
```llvm
start:
    %self1 = alloca [80 x i8], align 8
    %pager = alloca [8 x i8], align 8
    %mv_store = alloca [8 x i8], align 8
    store ptr %0, ptr %mv_store, align 8
    store ptr %1, ptr %pager, align 8
    %2 = getelementptr inbounds i8, ptr %state, i64 580
    %3 = getelementptr inbounds i8, ptr %state, i64 576
    %4 = getelementptr inbounds i8, ptr %self, i64 16
    %5 = getelementptr inbounds i8, ptr %self, i64 8
    %6 = getelementptr inbounds i8, ptr %self1, i64 8
    br label %bb1, !dbg !286780
```
When it comes to branch prediction, we don't really fix a lot because
thankfully rust already compiles `match` expressions
to a jump table:

```llvm
%insn = getelementptr inbounds [0 x %"vdbe::insn::Insn"], ptr %self657,
i64 0, i64 %index, !dbg !249527
%332 = load i8, ptr %insn, align 8, !dbg !249528, !range !38291,
!noundef !14
switch i8 %332, label %default.unreachable26674 [
    i8 0, label %bb111
    i8 1, label %bb101
    i8 2, label %bb100
    i8 3, label %bb110
    ...
    i8 104, label %bb5
    i8 105, label %bb16
    i8 106, label %bb14
], !dbg !249530
```

Some results
----
```
function dispatch:
Execute `SELECT 1`/limbo_execute_select_1
                        time:   [29.498 ns 29.548 ns 29.601 ns]
                        change: [-3.6125% -3.3592% -3.0804%] (p = 0.00 <
0.05)

main:
Execute `SELECT 1`/limbo_execute_select_1
                        time:   [33.789 ns 33.832 ns 33.878 ns]
```
2025-04-02 14:55:37 +02:00
Pekka Enberg
387b68fc06 Merge 'Expose 'Explain' to prepared statement to allow for alternate Writer ' from Preston Thorpe
### The problem:
I often need to copy the output of an `Explain` statement to my
clipboard. Currently this is not possible because it currently will only
write to stdout.
All other limbo output, I am able to run `.output file` in the CLI, then
enter my query and in another tmux pane I simply `cat file | xclip -in
-selection clipboard`.
### The solution:
Expose a `statement.explain()` method that returns the query explanation
as a string. If the user uses something like `execute` instead of
prepare, it will default to `stdout` as expected, but this allows the
user to access the query plan on the prepared statement and do with it
what they please.

Closes #1166
2025-03-28 09:55:58 +02:00
PThorpe92
7b55f7a167
Move explain to statement to allow for alternate writer 2025-03-24 18:48:12 -04:00
Ihor Andrianov
d8e070a360
moved json_cache to state 2025-03-24 14:48:40 +02:00
Ihor Andrianov
1511c9b3bf
add json cache to json functions and fix tests 2025-03-24 13:17:58 +02:00
Pere Diaz Bou
00ab3d1c0c Fix ordering and implement Deref 2025-03-17 10:22:42 +01:00
Pere Diaz Bou
20f5ade95e Experiment with a custom Lock for database header 2025-03-17 10:21:34 +01:00
Pekka Enberg
b0636e4494 Merge 'Adds Drop Table' from Zaid Humayun
This PR adds support for `DROP TABLE` and addresses issue
https://github.com/tursodatabase/limbo/issues/894
It depends on https://github.com/tursodatabase/limbo/pull/785 being
merged in because it requires the implementation of `free_page`.
EDIT: The PR above has been merged.
It adds the following:
* an implementation for the `DropTable` AST instruction via a method
called `translate_drop_table`
* a couple of new instructions - `Destroy` and `DropTable`. The former
is to modify physical b-tree pages and the latter is to modify in-memory
structures like the schema hash table.
* `btree_destroy` on `BTreeCursor` to walk the tree of pages for this
table and place it in free list.
* state machine traversal for both `btree_destroy` and
`clear_overflow_pages` to ensure performant, correct code.
* unit & tcl tests
* modifies the `Null` instruction to follow SQLite semantics and accept
a second register. It will set all registers in this range to null. This
is required for `DROP TABLE`.
The screenshots below have a comparison of the bytecodes generated via
SQLite & Limbo.
Limbo has the same instruction set except for the subroutines which
involve opening an ephemeral table, copying over the triggers from the
`sqlite_schema` table and then re-inserting them back into the
`sqlite_schema` table.
This is because `OpenEphemeral` is still a WIP and is being tracked at
https://github.com/tursodatabase/limbo/pull/768
![Screenshot 2025-02-09 at 7 05 03 PM](https://github.com/user-
attachments/assets/1d597001-a60c-4a76-89fd-8b90881c77c9)
![Screenshot 2025-02-09 at 7 05 35 PM](https://github.com/user-
attachments/assets/ecfd2a7a-2edc-49cd-a8d1-7b4db8657444)

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #897
2025-03-06 18:27:41 +02:00
Pere Diaz Bou
e4a8ee5402 move load extensions to Connection
Extensions are loaded per connection and not per database as per SQLite
behaviour. This also helps with removing locks.
2025-03-05 14:07:48 +01:00
Pere Diaz Bou
8daf7666d1 Make database Sync + Send 2025-03-05 14:07:48 +01:00
Zaid Humayun
23a904f38d Merge branch 'main' of https://github.com/tursodatabase/limbo 2025-03-01 01:18:45 +05:30
Zaid Humayun
fbc8cd7e70 vdbe: modified the Null instruction
modified the Null instruction to more closely match SQLite semantics. Allows passing in a second register and all registers from r1..r2 area set to null
2025-02-19 21:46:26 +05:30
PThorpe92
9c8083231c
Implement create virtual table and VUpdate opcode 2025-02-17 20:44:44 -05:00
[B
5214cf9859 Added IdxLE and IdxLT opcodes 2025-02-14 20:22:30 +01:00
Pekka Enberg
34b0c7c09a core/vdbe: AutoCommit instruction 2025-02-14 10:26:31 +02:00