mirror of
https://github.com/apache/datafusion-sqlparser-rs.git
synced 2025-07-07 17:04:59 +00:00
284 lines
12 KiB
Markdown
284 lines
12 KiB
Markdown
<!---
|
|
Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
-->
|
|
|
|
# Extensible SQL Lexer and Parser for Rust
|
|
|
|
[](https://opensource.org/licenses/Apache-2.0)
|
|
[](https://crates.io/crates/sqlparser)
|
|
[](https://github.com/sqlparser-rs/sqlparser-rs/actions?query=workflow%3ARust+branch%3Amain)
|
|
[](https://coveralls.io/github/sqlparser-rs/sqlparser-rs?branch=main)
|
|
[](https://gitter.im/sqlparser-rs/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
|
|
|
|
This crate contains a lexer and parser for SQL that conforms with the
|
|
[ANSI/ISO SQL standard][sql-standard] and other dialects. This crate
|
|
is used as a foundation for SQL query engines, vendor-specific
|
|
parsers, and various SQL analysis.
|
|
|
|
## Example
|
|
|
|
To parse a simple `SELECT` statement:
|
|
|
|
```rust
|
|
use sqlparser::dialect::GenericDialect;
|
|
use sqlparser::parser::Parser;
|
|
|
|
let sql = "SELECT a, b, 123, myfunc(b) \
|
|
FROM table_1 \
|
|
WHERE a > b AND b < 100 \
|
|
ORDER BY a DESC, b";
|
|
|
|
let dialect = GenericDialect {}; // or AnsiDialect, or your own dialect ...
|
|
|
|
let ast = Parser::parse_sql(&dialect, sql).unwrap();
|
|
|
|
println!("AST: {:?}", ast);
|
|
```
|
|
|
|
This outputs
|
|
|
|
```rust
|
|
AST: [Query(Query { ctes: [], body: Select(Select { distinct: false, projection: [UnnamedExpr(Identifier("a")), UnnamedExpr(Identifier("b")), UnnamedExpr(Value(Long(123))), UnnamedExpr(Function(Function { name:ObjectName([Identifier(Ident { value: "myfunc", quote_style: None })]), args: [Identifier("b")], filter: None, over: None, distinct: false }))], from: [TableWithJoins { relation: Table { name: ObjectName([Identifier(Ident { value: "table_1", quote_style: None })]), alias: None, args: [], with_hints: [] }, joins: [] }], selection: Some(BinaryOp { left: BinaryOp { left: Identifier("a"), op: Gt, right: Identifier("b") }, op: And, right: BinaryOp { left: Identifier("b"), op: Lt, right: Value(Long(100)) } }), group_by: [], having: None }), order_by: [OrderByExpr { expr: Identifier("a"), asc: Some(false) }, OrderByExpr { expr: Identifier("b"), asc: None }], limit: None, offset: None, fetch: None })]
|
|
```
|
|
|
|
|
|
## Features
|
|
|
|
The following optional [crate features](https://doc.rust-lang.org/cargo/reference/features.html) are available:
|
|
|
|
* `serde`: Adds [Serde](https://serde.rs/) support by implementing `Serialize` and `Deserialize` for all AST nodes.
|
|
* `visitor`: Adds a `Visitor` capable of recursively walking the AST tree.
|
|
* `recursive-protection` (enabled by default), uses [recursive](https://docs.rs/recursive/latest/recursive/) for stack overflow protection.
|
|
|
|
## Syntax vs Semantics
|
|
|
|
This crate provides only a syntax parser, and tries to avoid applying
|
|
any SQL semantics, and accepts queries that specific databases would
|
|
reject, even when using that Database's specific `Dialect`. For
|
|
example, `CREATE TABLE(x int, x int)` is accepted by this crate, even
|
|
though most SQL engines will reject this statement due to the repeated
|
|
column name `x`.
|
|
|
|
This crate avoids semantic analysis because it varies drastically
|
|
between dialects and implementations. If you want to do semantic
|
|
analysis, feel free to use this project as a base.
|
|
|
|
## Preserves Syntax Round Trip
|
|
|
|
This crate allows users to recover the original SQL text (with comments removed,
|
|
normalized whitespace and keyword capitalization), which is useful for tools
|
|
that analyze and manipulate SQL.
|
|
|
|
This means that other than comments, whitespace and the capitalization of
|
|
keywords, the following should hold true for all SQL:
|
|
|
|
```rust
|
|
// Parse SQL
|
|
let sql = "SELECT 'hello'";
|
|
let ast = Parser::parse_sql(&GenericDialect, sql).unwrap();
|
|
|
|
// The original SQL text can be generated from the AST
|
|
assert_eq!(ast[0].to_string(), sql);
|
|
|
|
// The SQL can also be pretty-printed with newlines and indentation
|
|
assert_eq!(format!("{:#}", ast[0]), "SELECT\n 'hello'");
|
|
```
|
|
|
|
There are still some cases in this crate where different SQL with seemingly
|
|
similar semantics are represented with the same AST. We welcome PRs to fix such
|
|
issues and distinguish different syntaxes in the AST.
|
|
|
|
|
|
## Source Locations (Work in Progress)
|
|
|
|
This crate allows recovering source locations from AST nodes via the [Spanned]
|
|
trait, which can be used for advanced diagnostics tooling. Note that this
|
|
feature is a work in progress and many nodes report missing or inaccurate spans.
|
|
Please see [this ticket] for information on how to contribute missing
|
|
improvements.
|
|
|
|
[Spanned]: https://docs.rs/sqlparser/latest/sqlparser/ast/trait.Spanned.html
|
|
[this ticket]: https://github.com/apache/datafusion-sqlparser-rs/issues/1548
|
|
|
|
```rust
|
|
// Parse SQL
|
|
let ast = Parser::parse_sql(&GenericDialect, "SELECT A FROM B").unwrap();
|
|
|
|
// The source span can be retrieved with start and end locations
|
|
assert_eq!(ast[0].span(), Span {
|
|
start: Location::of(1, 1),
|
|
end: Location::of(1, 16),
|
|
});
|
|
```
|
|
|
|
## SQL compliance
|
|
|
|
SQL was first standardized in 1987, and revisions of the standard have been
|
|
published regularly since. Most revisions have added significant new features to
|
|
the language, and as a result no database claims to support the full breadth of
|
|
features. This parser currently supports most of the SQL-92 syntax, plus some
|
|
syntax from newer versions that have been explicitly requested, plus various
|
|
other dialect-specific syntax. Whenever possible, the [online SQL:2016
|
|
grammar][sql-2016-grammar] is used to guide what syntax to accept.
|
|
|
|
Unfortunately, stating anything more specific about compliance is difficult.
|
|
There is no publicly available test suite that can assess compliance
|
|
automatically, and doing so manually would strain the project's limited
|
|
resources. Still, we are interested in eventually supporting the full SQL
|
|
dialect, and we are slowly building out our own test suite.
|
|
|
|
If you are assessing whether this project will be suitable for your needs,
|
|
you'll likely need to experimentally verify whether it supports the subset of
|
|
SQL that you need. Please file issues about any unsupported queries that you
|
|
discover. Doing so helps us prioritize support for the portions of the standard
|
|
that are actually used. Note that if you urgently need support for a feature,
|
|
you will likely need to write the implementation yourself. See the
|
|
[Contributing](#Contributing) section for details.
|
|
|
|
## Command line
|
|
|
|
This crate contains a CLI program that can parse a file and dump the results as JSON:
|
|
```
|
|
$ cargo run --features json_example --example cli FILENAME.sql [--dialectname]
|
|
```
|
|
|
|
## Users
|
|
|
|
This parser is currently being used by the [DataFusion] query engine, [LocustDB],
|
|
[Ballista], [GlueSQL], [Opteryx], [Polars], [PRQL], [Qrlew], [JumpWire], [ParadeDB], [CipherStash Proxy],
|
|
and [GreptimeDB].
|
|
|
|
If your project is using sqlparser-rs feel free to make a PR to add it
|
|
to this list.
|
|
|
|
## Design
|
|
|
|
The core expression parser uses the [Pratt Parser] design, which is a top-down
|
|
operator-precedence (TDOP) parser, while the surrounding SQL statement parser is
|
|
a traditional, hand-written recursive descent parser. Eli Bendersky has a good
|
|
[tutorial on TDOP parsers][tdop-tutorial], if you are interested in learning
|
|
more about the technique.
|
|
|
|
We are a fan of this design pattern over parser generators for the following
|
|
reasons:
|
|
|
|
- Code is simple to write and can be concise and elegant
|
|
- Performance is generally better than code generated by parser generators
|
|
- Debugging is much easier with hand-written code
|
|
- It is far easier to extend and make dialect-specific extensions
|
|
compared to using a parser generator
|
|
|
|
### Supporting custom SQL dialects
|
|
|
|
This is a work in progress, but we have some notes on [writing a custom SQL
|
|
parser](docs/custom_sql_parser.md).
|
|
|
|
## Contributing
|
|
|
|
Contributions are highly encouraged! However, the bandwidth we have to
|
|
maintain this crate is limited. Please read the following sections carefully.
|
|
|
|
### New Syntax
|
|
|
|
The most commonly accepted PRs add support for or fix a bug in a feature in the
|
|
SQL standard, or a popular RDBMS, such as Microsoft SQL
|
|
Server or PostgreSQL, will likely be accepted after a brief
|
|
review. Any SQL feature that is dialect specific should be parsed by *both* the relevant [`Dialect`]
|
|
as well as [`GenericDialect`].
|
|
|
|
### Major API Changes
|
|
|
|
The current maintainers do not plan for any substantial changes to
|
|
this crate's API. PRs proposing major refactors
|
|
are not likely to be accepted.
|
|
|
|
### Testing
|
|
|
|
While we hope to review PRs in a reasonably
|
|
timely fashion, it may take a week or more. In order to speed the process,
|
|
please make sure the PR passes all CI checks, and includes tests
|
|
demonstrating your code works as intended (and to avoid
|
|
regressions). Remember to also test error paths.
|
|
|
|
PRs without tests will not be reviewed or merged. Since the CI
|
|
ensures that `cargo test`, `cargo fmt`, and `cargo clippy`, pass you
|
|
should likely to run all three commands locally before submitting
|
|
your PR.
|
|
|
|
### Filing Issues
|
|
|
|
If you are unable to submit a patch, feel free to file an issue instead. Please
|
|
try to include:
|
|
|
|
* some representative examples of the syntax you wish to support or fix;
|
|
* the relevant bits of the [SQL grammar][sql-2016-grammar], if the syntax is
|
|
part of SQL:2016; and
|
|
* links to documentation for the feature for a few of the most popular
|
|
databases that support it.
|
|
|
|
Unfortunately, if you need support for a feature, you will likely need to implement
|
|
it yourself, or file a well enough described ticket that another member of the community can do so.
|
|
Our goal as maintainers is to facilitate the integration
|
|
of various features from various contributors, but not to provide the
|
|
implementations ourselves, as we simply don't have the resources.
|
|
|
|
### Benchmarking
|
|
|
|
There are several micro benchmarks in the `sqlparser_bench` directory.
|
|
You can run them with:
|
|
|
|
```
|
|
git checkout main
|
|
cd sqlparser_bench
|
|
cargo bench -- --save-baseline main
|
|
git checkout <your branch>
|
|
cargo bench -- --baseline main
|
|
```
|
|
|
|
By adding the `--save-baseline main` and `--baseline main` you can track the
|
|
progress of your improvements as you continue working on the feature branch.
|
|
|
|
## Licensing
|
|
|
|
All code in this repository is licensed under the [Apache Software License 2.0](LICENSE.txt).
|
|
|
|
Unless you explicitly state otherwise, any contribution intentionally submitted
|
|
for inclusion in the work by you, as defined in the Apache-2.0 license, shall be
|
|
licensed as above, without any additional terms or conditions.
|
|
|
|
|
|
[tdop-tutorial]: https://eli.thegreenplace.net/2010/01/02/top-down-operator-precedence-parsing
|
|
[`cargo fmt`]: https://github.com/rust-lang/rustfmt#on-the-stable-toolchain
|
|
[current issues]: https://github.com/sqlparser-rs/sqlparser-rs/issues
|
|
[DataFusion]: https://github.com/apache/arrow-datafusion
|
|
[LocustDB]: https://github.com/cswinter/LocustDB
|
|
[Ballista]: https://github.com/apache/arrow-ballista
|
|
[GlueSQL]: https://github.com/gluesql/gluesql
|
|
[Opteryx]: https://github.com/mabel-dev/opteryx
|
|
[Polars]: https://pola.rs/
|
|
[PRQL]: https://github.com/PRQL/prql
|
|
[Qrlew]: https://github.com/Qrlew/qrlew
|
|
[JumpWire]: https://github.com/extragoodlabs/jumpwire
|
|
[ParadeDB]: https://github.com/paradedb/paradedb
|
|
[Pratt Parser]: https://tdop.github.io/
|
|
[sql-2016-grammar]: https://jakewheat.github.io/sql-overview/sql-2016-foundation-grammar.html
|
|
[sql-standard]: https://en.wikipedia.org/wiki/ISO/IEC_9075
|
|
[`Dialect`]: https://docs.rs/sqlparser/latest/sqlparser/dialect/trait.Dialect.html
|
|
[`GenericDialect`]: https://docs.rs/sqlparser/latest/sqlparser/dialect/struct.GenericDialect.html
|
|
[CipherStash Proxy]: https://github.com/cipherstash/proxy
|
|
[GreptimeDB]: https://github.com/GreptimeTeam/greptimedb
|