This PR adds support for table-valued functions for PRAGMAs (see the
[PRAGMA functions section](https://www.sqlite.org/pragma.html)).
Additionally, it introduces built-in table-valued functions. I
considered using extensions for this, but there are several reasons in
favor of a dedicated mechanism:
* It simplifies the use of internal functions, structs, etc. For
example, when implementing `json_each` and `json_tree`, direct access to
internals was necessary:
https://github.com/tursodatabase/limbo/pull/1088
* It avoids FFI overhead. [Benchmarks](https://github.com/piotrrzysko/li
mbo/blob/pragma_vtabs_bench/core/benches/pragma_benchmarks.rs) on my
hardware show that `pragma_table_info()` implemented as an extension is
2.5× slower than the built-in version.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#1642
In preparation for `CREATE VIEW`, we need to have the original sql query
that was used to create the view. I'm using the scanner's offset to
slice into the original input, trimming the newlines, and passing it to
the translate function.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#1621
This pull request implements the `libsql_wal_get_frame()` API. To do
that, we also introduce a `wait_for_completion()` API in I/O dispatcher.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#1533
Closes#1528 .
- Modified `translate_select` so that the caller can define if the
statement is top-level statement or a subquery.
- Refactored `translate_insert` to offload the translation of multi-row
VALUES and SELECT statements to `translate_select`
- I did not try to change much of `populate_column_registers` as I did
not want to break `translate_virtual_table_insert`. Ideally, I would
want to unite this remaining logic folding `populate_column_registers`
into `populate_columns_multiple_rows` and the
`translate_virtual_table_insert` into `translate_insert`. But, I think
this may be best suited for a separate PR.
## TODO
- ~Tests~ - *Done*
- ~Need to emit a temp table when we are selecting and inserting into
the Same Table -
https://github.com/sqlite/sqlite/blob/master/src/insert.c#L1369~ -
*Done*
- Optimization when table have the exact same schema - open an Issue
about it
- Virtual Tables do not benefit yet from this feature - open an Issue
about it
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#1566
- Instead of using a confusing CheckpointStatus for many different things,
introduce the following statuses:
* PagerCacheflushStatus - cacheflush can result in either:
- the WAL being written to disk and fsynced
- but also a checkpoint to the main BD file, and fsyncing the main DB file
Reflect this in the type.
* WalFsyncStatus - previously CheckpointStatus was also used for this, even
though fsyncing the WAL doesn't checkpoint.
* CheckpointStatus/CheckpointResult is now used only for actual checkpointing.
- Rename HaltState to CommitState (program.halt_state -> program.commit_state)
- Make WAL a non-optional property in Pager
* This gets rid of a lot of if let Some(...) boilerplate
* For ephemeral indexes, provide a DummyWAL implementation that does nothing.
- Rename program.halt() to program.commit_txn()
- Add some documentation comments to structs and functions
Currently our "table id"/"table no"/"table idx" references always
use the direct index of the `TableReference` in the plan, e.g. in
`SelectPlan::table_references`. For example:
```rust
Expr::Column { table: 0, column: 3, .. }
```
refers to the 0'th table in the `table_references` list.
This is a fragile approach because it assumes the table_references
list is stable for the lifetime of the query processing. This has so
far been the case, but there exist certain query transformations,
e.g. subquery unnesting, that may fold new table references from
a subquery (which has its own table ref list) into the table reference
list of the parent.
If such a transformation is made, then potentially all of the Expr::Column
references to tables will become invalid. Consider this example:
```sql
-- Assume tables: users(id, age), orders(user_id, amount)
-- Get total amount spent per user on orders over $100
SELECT u.id, sub.total
FROM users u JOIN
(SELECT user_id, SUM(amount) as total
FROM orders o
WHERE o.amount > 100
GROUP BY o.user_id) sub
WHERE u.id = sub.user_id
-- Before subquery unnesting:
-- Main query table_references: [users, sub]
-- u.id refers to table 0, column 0
-- sub.total refers to table 1, column 1
--
-- Subquery table_references: [orders]
-- o.user_id refers to table 0, column 0
-- o.amount refers to table 0, column 1
--
-- After unnesting and folding subquery tables into main query,
-- the query might look like this:
SELECT u.id, SUM(o.amount) as total
FROM users u JOIN orders o ON u.id = o.user_id
WHERE o.amount > 100
GROUP BY u.id;
-- Main query table_references: [users, orders]
-- u.id refers to table index 0 (correct)
-- o.amount refers to table index 0 (incorrect, should be 1)
-- o.user_id refers to table index 0 (incorrect, should be 1)
```
We could ofc traverse every expression in the subquery and rewrite
the table indexes to be correct, but if we instead use stable identifiers
for each table reference, then all the column references will continue
to be correct.
Hence, this PR introduces a `TableInternalId` used in `TableReference`
as well as `Expr::Column` and `Expr::Rowid` so that this kind of query
transformations can happen with less pain.
Re-Opening #1076 because it had bit-rotted to a point of no return.
However it has improved. Now with Weak references and no incrementing Rc
strong counts.
This also includes a better test extension that returns info about the
other tables in the schema.

(theme doesn't show rows column)
Closes#1366
Shared cache requires more locking mechasnisms. We still have multi
threading issues not related to shared cache so it is wise to first fix
those and then once they are fixed, we can incrementally add shared
cache back with locking in place.
This PR introduces some modifications to the Program Builder to allow us
to use nested parsing. By focusing the emission of Init and the last
Goto (prologue and epilogue), inside the ProgramBuilder, we can just not
emit them if we are parsing/translating in a nested context. For this
PR, I only migrated insert to use these functions as I need them to
support Insert statements that use `SELECT FROM` syntax. Nested parsing
overall enables code reuse for us and arguably is one of the only ways
to parse deeply nested queries without a lot of code duplication.
#1528Closes#1543
This PR builds on top of
https://github.com/tursodatabase/limbo/pull/1368 and adds few things
like allowing inserting pages with the same page key, fix fuzz tests by
adding transactions and some minor improvements to cacheflush.
Closes#1523
This PR adds a port of [SQLite's CSV virtual table
extension](https://www.sqlite.org/csv.html).
Planned follow-ups:
* Pass detailed error messages from `VTabModule::create`, not just
`ResultCode`s.
* Address the TODO in `VTabModuleImpl::create_schema`.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#1544
One problem we have with PageRef, is that this Page reference can be
unloaded, this means if we read the page again instead of loading the
page onto the same reference, we will have split brain of references.
To solve this we wrap PageRef in `BTreePage` so that if a page is seen
as unloaded, we will replace BTreePage::page with the newest version of
the page.
I tried to be the most similar to rusqlite as possible. The only thing
that's bothering me is `Vec<Vec<Value>>` which I think can be improved
but not so sure how, any inputs on this are welcomed.
Closes#1536
The following code reproduces the leak, with memory usage increasing
over time:
```
#[tokio::main]
async fn main() {
let db = Builder::new_local(":memory:").build().await.unwrap();
let conn = db.connect().unwrap();
conn.execute("SELECT load_extension('./target/debug/liblimbo_series');", ())
.await
.unwrap();
loop {
conn.execute("SELECT * FROM generate_series(1,10,2);", ())
.await
.unwrap();
}
}
```
After switching to the system allocator, the leak becomes detectable
with Valgrind:
```
32,000 bytes in 1,000 blocks are definitely lost in loss record 24 of 24
at 0x538580F: malloc (vg_replace_malloc.c:446)
by 0x62E15FA: alloc::alloc::alloc (alloc.rs:99)
by 0x62E172C: alloc::alloc::Global::alloc_impl (alloc.rs:192)
by 0x62E1530: allocate (alloc.rs:254)
by 0x62E1530: alloc::alloc::exchange_malloc (alloc.rs:349)
by 0x62E0271: new<limbo_series::GenerateSeriesCursor> (boxed.rs:257)
by 0x62E0271: open_GenerateSeriesVTab (lib.rs:19)
by 0x425D8FA: limbo_core::VirtualTable::open (lib.rs:732)
by 0x4285DDA: limbo_core::vdbe::execute::op_vopen (execute.rs:890)
by 0x42351E8: limbo_core::vdbe::Program::step (mod.rs:396)
by 0x425C638: limbo_core::Statement::step (lib.rs:610)
by 0x40DB238: limbo::Statement::execute::{{closure}} (lib.rs:181)
by 0x40D9EAF: limbo::Connection::execute::{{closure}} (lib.rs:109)
by 0x40D54A1: example::main::{{closure}} (example.rs:26)
```
Interestingly, when using mimalloc, neither Valgrind nor mimalloc’s
internal statistics report the leak.