Separate statement from expr parsing (4/5)

mirror of https://github.com/apache/datafusion-sqlparser-rs.git synced 2025-10-17 09:17:14 +00:00

Continuing from https://github.com/andygrove/sqlparser-rs/pull/33#issuecomment-453060427

This stops the parser from accepting (and the AST from being able to
represent) SQL look-alike code that makes no sense, e.g.

    SELECT ... FROM (CREATE TABLE ...) foo
    SELECT ... FROM (1+CAST(...)) foo

Generally this makes the AST less "partially typed": meaning certain
parts are strongly typed (e.g. SELECT can only contain projections,
relations, etc.), while everything that didn't get its own type is
dumped into ASTNode, effectively untyped. After a few more fixes (yet
to be implemented), `ASTNode` could become an `SQLExpression`. The
Pratt-style expression parser (returning an SQLExpression) would be
invoked from the top-down parser in places where a generic expression
is expected (e.g. after SELECT <...>, WHERE <...>, etc.), while things
like select's `projection` and `relation` could be more appropriately
(narrowly) typed.


Since the diff is quite large due to necessarily large number of
mechanical changes, here's an overview:

1) Interface changes:

   - A new AST enum - `SQLStatement` - is split out of ASTNode:

     - The variants of the ASTNode enum, which _only_ make sense as a top
       level statement (INSERT, UPDATE, DELETE, CREATE, ALTER, COPY) are
       _moved_ to the new enum, with no other changes.
     - SQLSelect is _duplicated_: now available both as a variant in
       SQLStatement::SQLSelect (top-level SELECT) and ASTNode:: (subquery).

   - The main entry point (Parser::parse_sql) now expects an SQL statement
     as input, and returns an `SQLStatement`.

2) Parser changes: instead of detecting the top-level constructs deep
down in the precedence parser (`parse_prefix`) we are able to do it
just right after setting up the parser in the `parse_sql` entry point

(SELECT, again, is kept in the expression parser to demonstrate how
subqueries could be implemented).

The rest of parser changes are mechanical ASTNode -> SQLStatement
replacements resulting from the AST change.

3) Testing changes: for every test - depending on whether the input was
a complete statement or an expresssion -  I used an appropriate helper
function:

   - `verified` (parses SQL, checks that it round-trips, and returns
     the AST) - was replaced by `verified_stmt` or `verified_expr`.

   - `parse_sql` (which returned AST without checking it round-tripped)
     was replaced by:

     - `parse_sql_expr` (same function, for expressions)

     - `one_statement_parses_to` (formerly `parses_to`), extended to
       deal with statements that are not expected to round-trip.
       The weird name is to reduce further churn when implementing
       multi-statement parsing.

     - `verified_stmt` (in 4 testcases that actually round-tripped)

This commit is contained in:

Nickolay Ponomarev

2019-01-31 04:56:20 +03:00

parent 7b86f5c842

commit 2dec65fdb4

5 changed files with 245 additions and 193 deletions

									
										4

tests/sqlparser_ansi.rs
									
										View file
										
				@ -9,7 +9,7 @@ use sqlparser::sqltokenizer::*;

				#[test]

				fn parse_simple_select() {

				    let sql = String::from("SELECT id, fname, lname FROM customer WHERE id = 1");

				    let ast = parse_sql(&sql);

				    let ast = parse_sql_expr(&sql);

				    match ast {

				        ASTNode::SQLSelect(SQLSelect { projection, .. }) => {

				            assert_eq!(3, projection.len());

				@ -18,7 +18,7 @@ fn parse_simple_select() {

				    }

				}

				fn parse_sql(sql: &str) -> ASTNode {

				fn parse_sql_expr(sql: &str) -> ASTNode {

				    let dialect = AnsiSqlDialect {};

				    let mut tokenizer = Tokenizer::new(&dialect, &sql);

				    let tokens = tokenizer.tokenize().unwrap();

Rows
Columns

Separate statement from expr parsing (4/5)

4 tests/sqlparser_ansi.rs Unescape Escape View file

4

tests/sqlparser_ansi.rs

View file