![]() Roughly the <character factor> production - https://jakewheat.github.io/sql-overview/sql-2011-foundation-grammar.html#character-factor If an expression is followed by the keyword `COLLATE`, it must be followed by the collation name, which is optionally schema-qualified identifier. The `COLLATE` keyword is not a regular binary operator in that it can't be "nested": `foo COLLATE bar COLLATE baz` is not valid. If you prefer to think of it as an operator, you might say it has the highest precedence (judging from the spec), i.e. it binds to the smallest valid expression to the left of it (so in `foo < bar COLLATE c`, the COLLATE is applied first). |
||
---|---|---|
docs | ||
examples | ||
src | ||
tests | ||
.gitignore | ||
.travis.yml | ||
Cargo.toml | ||
LICENSE.TXT | ||
README.md | ||
rustfmt.toml |
Extensible SQL Lexer and Parser for Rust
The goal of this project is to build a SQL lexer and parser capable of parsing SQL that conforms with the ANSI SQL:2011 standard but also making it easy to support custom dialects so that this crate can be used as a foundation for vendor-specific parsers.
This parser is currently being used by the DataFusion query engine and LocustDB.
Example
The current code is capable of parsing some trivial SELECT and CREATE TABLE statements.
let sql = "SELECT a, b, 123, myfunc(b) \
FROM table_1 \
WHERE a > b AND b < 100 \
ORDER BY a DESC, b";
let dialect = GenericSqlDialect{}; // or AnsiSqlDialect, or your own dialect ...
let ast = Parser::parse_sql(&dialect,sql.to_string()).unwrap();
println!("AST: {:?}", ast);
This outputs
AST: [SQLSelect(SQLQuery { ctes: [], body: Select(SQLSelect { distinct: false, projection: [UnnamedExpression(SQLIdentifier("a")), UnnamedExpression(SQLIdentifier("b")), UnnamedExpression(SQLValue(Long(123))), UnnamedExpression(SQLFunction { name: SQLObjectName(["myfunc"]), args: [SQLIdentifier("b")], over: None })], relation: Some(Table { name: SQLObjectName(["table_1"]), alias: None }), joins: [], selection: Some(SQLBinaryExpr { left: SQLBinaryExpr { left: SQLIdentifier("a"), op: Gt, right: SQLIdentifier("b") }, op: And, right: SQLBinaryExpr { left: SQLIdentifier("b"), op: Lt, right: SQLValue(Long(100)) } }), group_by: None, having: None }), order_by: Some([SQLOrderByExpr { expr: SQLIdentifier("a"), asc: Some(false) }, SQLOrderByExpr { expr: SQLIdentifier("b"), asc: None }]), limit: None })]
Design
This parser is implemented using the Pratt Parser design, which is a top-down operator-precedence parser.
I am a fan of this design pattern over parser generators for the following reasons:
- Code is simple to write and can be concise and elegant (this is far from true for this current implementation unfortunately, but I hope to fix that using some macros)
- Performance is generally better than code generated by parser generators
- Debugging is much easier with hand-written code
- It is far easier to extend and make dialect-specific extensions compared to using a parser generator
Supporting custom SQL dialects
This is a work in progress but I started some notes on writing a custom SQL parser.
Contributing
Contributors are welcome! Please see the current issues and feel free to file more!
Please run cargo fmt to ensure the code is properly formatted.