![]() Continuing from https://github.com/andygrove/sqlparser-rs/pull/33#issuecomment-453060427 This stops the parser from accepting (and the AST from being able to represent) SQL look-alike code that makes no sense, e.g. SELECT ... FROM (CREATE TABLE ...) foo SELECT ... FROM (1+CAST(...)) foo Generally this makes the AST less "partially typed": meaning certain parts are strongly typed (e.g. SELECT can only contain projections, relations, etc.), while everything that didn't get its own type is dumped into ASTNode, effectively untyped. After a few more fixes (yet to be implemented), `ASTNode` could become an `SQLExpression`. The Pratt-style expression parser (returning an SQLExpression) would be invoked from the top-down parser in places where a generic expression is expected (e.g. after SELECT <...>, WHERE <...>, etc.), while things like select's `projection` and `relation` could be more appropriately (narrowly) typed. Since the diff is quite large due to necessarily large number of mechanical changes, here's an overview: 1) Interface changes: - A new AST enum - `SQLStatement` - is split out of ASTNode: - The variants of the ASTNode enum, which _only_ make sense as a top level statement (INSERT, UPDATE, DELETE, CREATE, ALTER, COPY) are _moved_ to the new enum, with no other changes. - SQLSelect is _duplicated_: now available both as a variant in SQLStatement::SQLSelect (top-level SELECT) and ASTNode:: (subquery). - The main entry point (Parser::parse_sql) now expects an SQL statement as input, and returns an `SQLStatement`. 2) Parser changes: instead of detecting the top-level constructs deep down in the precedence parser (`parse_prefix`) we are able to do it just right after setting up the parser in the `parse_sql` entry point (SELECT, again, is kept in the expression parser to demonstrate how subqueries could be implemented). The rest of parser changes are mechanical ASTNode -> SQLStatement replacements resulting from the AST change. 3) Testing changes: for every test - depending on whether the input was a complete statement or an expresssion - I used an appropriate helper function: - `verified` (parses SQL, checks that it round-trips, and returns the AST) - was replaced by `verified_stmt` or `verified_expr`. - `parse_sql` (which returned AST without checking it round-tripped) was replaced by: - `parse_sql_expr` (same function, for expressions) - `one_statement_parses_to` (formerly `parses_to`), extended to deal with statements that are not expected to round-trip. The weird name is to reduce further churn when implementing multi-statement parsing. - `verified_stmt` (in 4 testcases that actually round-tripped) |
||
---|---|---|
docs | ||
examples | ||
src | ||
tests | ||
.gitignore | ||
.travis.yml | ||
Cargo.toml | ||
LICENSE.TXT | ||
README.md |
Extensible SQL Lexer and Parser for Rust
The goal of this project is to build a SQL lexer and parser capable of parsing SQL that conforms with the ANSI SQL:2011 standard but also making it easy to support custom dialects so that this crate can be used as a foundation for vendor-specific parsers.
This parser is currently being used by the DataFusion query engine and LocustDB.
Example
The current code is capable of parsing some trivial SELECT and CREATE TABLE statements.
let sql = "SELECT a, b, 123, myfunc(b) \
FROM table_1 \
WHERE a > b AND b < 100 \
ORDER BY a DESC, b";
let dialect = GenericSqlDialect{}; // or AnsiSqlDialect, or your own dialect ...
let ast = Parser::parse_sql(&dialect,sql.to_string()).unwrap();
println!("AST: {:?}", ast);
This outputs
AST: SQLSelect { projection: [SQLIdentifier("a"), SQLIdentifier("b"), SQLLiteralLong(123), SQLFunction { id: "myfunc", args: [SQLIdentifier("b")] }], relation: Some(SQLIdentifier("table_1")), selection: Some(SQLBinaryExpr { left: SQLBinaryExpr { left: SQLIdentifier("a"), op: Gt, right: SQLIdentifier("b") }, op: And, right: SQLBinaryExpr { left: SQLIdentifier("b"), op: Lt, right: SQLLiteralLong(100) } }), order_by: Some([SQLOrderBy { expr: SQLIdentifier("a"), asc: false }, SQLOrderBy { expr: SQLIdentifier("b"), asc: true }]), group_by: None, having: None, limit: None }
Design
This parser is implemented using the Pratt Parser design, which is a top-down operator-precedence parser.
I am a fan of this design pattern over parser generators for the following reasons:
- Code is simple to write and can be concise and elegant (this is far from true for this current implementation unfortunately, but I hope to fix that using some macros)
- Performance is generally better than code generated by parser generators
- Debugging is much easier with hand-written code
- It is far easier to extend and make dialect-specific extensions compared to using a parser generator
Supporting custom SQL dialects
This is a work in progress but I started some notes on writing a custom SQL parser.
Contributing
Contributors are welcome! Please see the current issues and feel free to file more!