Extensible SQL Lexer and Parser for Rust
Find a file
Nickolay Ponomarev 2dec65fdb4 Separate statement from expr parsing (4/5)
Continuing from https://github.com/andygrove/sqlparser-rs/pull/33#issuecomment-453060427

This stops the parser from accepting (and the AST from being able to
represent) SQL look-alike code that makes no sense, e.g.

    SELECT ... FROM (CREATE TABLE ...) foo
    SELECT ... FROM (1+CAST(...)) foo

Generally this makes the AST less "partially typed": meaning certain
parts are strongly typed (e.g. SELECT can only contain projections,
relations, etc.), while everything that didn't get its own type is
dumped into ASTNode, effectively untyped. After a few more fixes (yet
to be implemented), `ASTNode` could become an `SQLExpression`. The
Pratt-style expression parser (returning an SQLExpression) would be
invoked from the top-down parser in places where a generic expression
is expected (e.g. after SELECT <...>, WHERE <...>, etc.), while things
like select's `projection` and `relation` could be more appropriately
(narrowly) typed.


Since the diff is quite large due to necessarily large number of
mechanical changes, here's an overview:

1) Interface changes:

   - A new AST enum - `SQLStatement` - is split out of ASTNode:

     - The variants of the ASTNode enum, which _only_ make sense as a top
       level statement (INSERT, UPDATE, DELETE, CREATE, ALTER, COPY) are
       _moved_ to the new enum, with no other changes.
     - SQLSelect is _duplicated_: now available both as a variant in
       SQLStatement::SQLSelect (top-level SELECT) and ASTNode:: (subquery).

   - The main entry point (Parser::parse_sql) now expects an SQL statement
     as input, and returns an `SQLStatement`.

2) Parser changes: instead of detecting the top-level constructs deep
down in the precedence parser (`parse_prefix`) we are able to do it
just right after setting up the parser in the `parse_sql` entry point

(SELECT, again, is kept in the expression parser to demonstrate how
subqueries could be implemented).

The rest of parser changes are mechanical ASTNode -> SQLStatement
replacements resulting from the AST change.

3) Testing changes: for every test - depending on whether the input was
a complete statement or an expresssion -  I used an appropriate helper
function:

   - `verified` (parses SQL, checks that it round-trips, and returns
     the AST) - was replaced by `verified_stmt` or `verified_expr`.

   - `parse_sql` (which returned AST without checking it round-tripped)
     was replaced by:

     - `parse_sql_expr` (same function, for expressions)

     - `one_statement_parses_to` (formerly `parses_to`), extended to
       deal with statements that are not expected to round-trip.
       The weird name is to reduce further churn when implementing
       multi-statement parsing.

     - `verified_stmt` (in 4 testcases that actually round-tripped)
2019-01-31 15:54:57 +03:00
docs Update docs on writing custom parsers 2018-09-08 07:29:34 -06:00
examples tokenizer delegates to dialect now 2018-09-08 14:49:25 -06:00
src Separate statement from expr parsing (4/5) 2019-01-31 15:54:57 +03:00
tests Separate statement from expr parsing (4/5) 2019-01-31 15:54:57 +03:00
.gitignore roughing out classic pratt parser 2018-02-08 07:49:24 -07:00
.travis.yml add travis build script 2018-09-03 11:03:04 -06:00
Cargo.toml (cargo-release) start next development iteration 0.2.2-alpha.0 2019-01-13 09:19:46 -07:00
LICENSE.TXT replace with code from datafusion 2018-09-03 09:56:39 -06:00
README.md Update README 2019-01-12 10:00:00 -07:00

Extensible SQL Lexer and Parser for Rust

License Version Build Status Coverage Status Gitter chat

The goal of this project is to build a SQL lexer and parser capable of parsing SQL that conforms with the ANSI SQL:2011 standard but also making it easy to support custom dialects so that this crate can be used as a foundation for vendor-specific parsers.

This parser is currently being used by the DataFusion query engine and LocustDB.

Example

The current code is capable of parsing some trivial SELECT and CREATE TABLE statements.

let sql = "SELECT a, b, 123, myfunc(b) \
           FROM table_1 \
           WHERE a > b AND b < 100 \
           ORDER BY a DESC, b";

let dialect = GenericSqlDialect{}; // or AnsiSqlDialect, or your own dialect ...

let ast = Parser::parse_sql(&dialect,sql.to_string()).unwrap();

println!("AST: {:?}", ast);

This outputs

AST: SQLSelect { projection: [SQLIdentifier("a"), SQLIdentifier("b"), SQLLiteralLong(123), SQLFunction { id: "myfunc", args: [SQLIdentifier("b")] }], relation: Some(SQLIdentifier("table_1")), selection: Some(SQLBinaryExpr { left: SQLBinaryExpr { left: SQLIdentifier("a"), op: Gt, right: SQLIdentifier("b") }, op: And, right: SQLBinaryExpr { left: SQLIdentifier("b"), op: Lt, right: SQLLiteralLong(100) } }), order_by: Some([SQLOrderBy { expr: SQLIdentifier("a"), asc: false }, SQLOrderBy { expr: SQLIdentifier("b"), asc: true }]), group_by: None, having: None, limit: None }

Design

This parser is implemented using the Pratt Parser design, which is a top-down operator-precedence parser.

I am a fan of this design pattern over parser generators for the following reasons:

  • Code is simple to write and can be concise and elegant (this is far from true for this current implementation unfortunately, but I hope to fix that using some macros)
  • Performance is generally better than code generated by parser generators
  • Debugging is much easier with hand-written code
  • It is far easier to extend and make dialect-specific extensions compared to using a parser generator

Supporting custom SQL dialects

This is a work in progress but I started some notes on writing a custom SQL parser.

Contributing

Contributors are welcome! Please see the current issues and feel free to file more!