Extensible SQL Lexer and Parser for Rust
Find a file
Andy Grove dc97d8ef5d Badges
2018-09-03 11:18:17 -06:00
examples Save 2018-09-03 10:31:04 -06:00
src Remove some non ANSI SQL support 2018-09-03 10:25:05 -06:00
.gitignore roughing out classic pratt parser 2018-02-08 07:49:24 -07:00
.travis.yml add travis build script 2018-09-03 11:03:04 -06:00
Cargo.toml replace with code from datafusion 2018-09-03 09:56:39 -06:00
LICENSE.TXT replace with code from datafusion 2018-09-03 09:56:39 -06:00
README.md Badges 2018-09-03 11:18:17 -06:00

ANSI SQL Lexer and Parser for Rust

License Version Build Status Coverage Status Gitter chat

The main goal of this project is to build a SQL lexer and parser capable of parsing ANSI SQL:2011 (or 2016 if I can get access to the specification for free).

A secondary goal is to make it easy for others to use this library as a foundation for building custom SQL parsers for vendor-specific dialects.

Example

The current code is capable of parsing some trivial SELECT and CREATE TABLE statements.

Example

let sql = "SELECT a, b, 123, myfunc(b) \
    FROM table_1 \
    WHERE a > b AND b < 100 \
    ORDER BY a DESC, b";

let ast = Parser::parse_sql(sql.to_string()).unwrap();

println!("AST: {:?}", ast);

This outputs

AST: SQLSelect { projection: [SQLIdentifier("a"), SQLIdentifier("b"), SQLLiteralLong(123), SQLFunction { id: "myfunc", args: [SQLIdentifier("b")] }], relation: Some(SQLIdentifier("table_1")), selection: Some(SQLBinaryExpr { left: SQLBinaryExpr { left: SQLIdentifier("a"), op: Gt, right: SQLIdentifier("b") }, op: And, right: SQLBinaryExpr { left: SQLIdentifier("b"), op: Lt, right: SQLLiteralLong(100) } }), order_by: Some([SQLOrderBy { expr: SQLIdentifier("a"), asc: false }, SQLOrderBy { expr: SQLIdentifier("b"), asc: true }]), group_by: None, having: None, limit: None }

Design

This parser is implemented using the Pratt Parser design, which is a top-down operator-precedence parser.

I am a fan of this design pattern over parser generators for the following reasons:

  • Code is simple to write and can be concise and elegant (this is far from true for this current implementation unfortunately, but I hope to fix that using some macros)
  • Performance is generally better than code generated by parser generators
  • Debugging is much easier with hand-written code
  • It is far easier to extend and make dialect-specific extensions compared to using a parser generator

Contributing

Contributors are welcome! Please see the current issues and feel free to file more!