mirror of https://github.com/apache/datafusion-sqlparser-rs.git synced 2025-11-14 12:35:41 +00:00

Extensible SQL Lexer and Parser for Rust

Find a file

Nikhil Benesch c49352f394 Implement Hash on all AST nodes It is convenient for downstream libraries to be able to stash bits of ASTs into hash maps, e.g., for performing simple common subexpression elimination. The only downside to this change is that it requires that the f64 in the Value enum be wrapped in an OrderedFloat, which provides the necessary equality semantics to allow Hash to be drived. The reason f64 doesn't implement Hash by default is because NaN is typically not equal to itself, so it's not clear what it should hash to. That's less of a concern in a SQL context, because every SQL database I've looked at treats NaN as equal to itself, in violation of the IEEE standard, in order to permit indexing and sorting of float columns.		2019-06-03 11:11:24 -04:00
docs	Update docs on writing custom parsers	2018-09-08 07:29:34 -06:00
examples	[example-cli] Support parsing with different dialects	2019-06-02 13:48:14 +03:00
src	Implement Hash on all AST nodes	2019-06-03 11:11:24 -04:00
tests	Merge pull request #89 from benesch/sqlfunction-struct	2019-06-03 11:09:27 -04:00
.gitignore	roughing out classic pratt parser	2018-02-08 07:49:24 -07:00
.travis.yml	Run CI on stable Rust	2019-05-31 18:06:51 -04:00
Cargo.toml	Implement Hash on all AST nodes	2019-06-03 11:11:24 -04:00
LICENSE.TXT	replace with code from datafusion	2018-09-03 09:56:39 -06:00
README.md	Mention that we use rustfmt for code formatting	2019-05-06 22:20:29 +03:00
rustfmt.toml	Mention that we use rustfmt for code formatting	2019-05-06 22:20:29 +03:00

README.md

Extensible SQL Lexer and Parser for Rust

The goal of this project is to build a SQL lexer and parser capable of parsing SQL that conforms with the ANSI SQL:2011 standard but also making it easy to support custom dialects so that this crate can be used as a foundation for vendor-specific parsers.

This parser is currently being used by the DataFusion query engine and LocustDB.

Example

The current code is capable of parsing some trivial SELECT and CREATE TABLE statements.

let sql = "SELECT a, b, 123, myfunc(b) \
           FROM table_1 \
           WHERE a > b AND b < 100 \
           ORDER BY a DESC, b";

let dialect = GenericSqlDialect{}; // or AnsiSqlDialect, or your own dialect ...

let ast = Parser::parse_sql(&dialect,sql.to_string()).unwrap();

println!("AST: {:?}", ast);

This outputs

AST: [SQLSelect(SQLQuery { ctes: [], body: Select(SQLSelect { distinct: false, projection: [UnnamedExpression(SQLIdentifier("a")), UnnamedExpression(SQLIdentifier("b")), UnnamedExpression(SQLValue(Long(123))), UnnamedExpression(SQLFunction { name: SQLObjectName(["myfunc"]), args: [SQLIdentifier("b")], over: None })], relation: Some(Table { name: SQLObjectName(["table_1"]), alias: None }), joins: [], selection: Some(SQLBinaryExpr { left: SQLBinaryExpr { left: SQLIdentifier("a"), op: Gt, right: SQLIdentifier("b") }, op: And, right: SQLBinaryExpr { left: SQLIdentifier("b"), op: Lt, right: SQLValue(Long(100)) } }), group_by: None, having: None }), order_by: Some([SQLOrderByExpr { expr: SQLIdentifier("a"), asc: Some(false) }, SQLOrderByExpr { expr: SQLIdentifier("b"), asc: None }]), limit: None })]

Design

This parser is implemented using the Pratt Parser design, which is a top-down operator-precedence parser.

I am a fan of this design pattern over parser generators for the following reasons:

Code is simple to write and can be concise and elegant (this is far from true for this current implementation unfortunately, but I hope to fix that using some macros)
Performance is generally better than code generated by parser generators
Debugging is much easier with hand-written code
It is far easier to extend and make dialect-specific extensions compared to using a parser generator

Supporting custom SQL dialects

This is a work in progress but I started some notes on writing a custom SQL parser.

Contributing

Contributors are welcome! Please see the current issues and feel free to file more!

Please run cargo fmt to ensure the code is properly formatted.