Commit graph

8 commits

Author SHA1 Message Date
Nickolay Ponomarev
bf0c07bb1b Support basic CTEs (WITH)
Some unsupported features are noted as TODOs.
2019-02-11 05:13:48 +03:00
Nickolay Ponomarev
215820ef66 Stricter parsing for subqueries (3/4)
This makes the parser more strict when handling SELECTs nested
somewhere in the main statement:

1) instead of accepting SELECT anywhere in the expression where an
   operand was expected, we only accept it inside parens. (I've added a
   test for the currently supported syntax, <scalar subquery> in ANSI
   SQL terms)

2) instead of accepting any expression in the derived table context:
   `FROM ( ... )` - we only look for a SELECT subquery there.

Due to #1, I had to swith the 'ansi' test from invoking the expression
parser to the statement parser.
2019-02-07 05:31:36 +03:00
Nickolay Ponomarev
b57c60a78c Only use parse_expr() when we expect an expression (0/4)
Before this commit there was a single `parse_expr(u8)` method, which
was called both

1) from within the expression parser (to parse subexpression consisting
   of operators with higher priority than the current one), and

2) from the top-down parser both

   a) to parse true expressions (such as an item of the SELECT list or
      the condition after WHERE or after ON), and 
   b) to parse sequences which are not exactly "expressions".


This starts cleaning this up by renaming the `parse_expr(u8)` method to
`parse_subexpr()` and using it only for (1) - i.e. usually providing a
non-zero precedence parameter.

The non-intuitively called `parse()` method is renamed to `parse_expr()`,
which became available and is used for (2a).


While reviewing the existing callers of `parse_expr`, four points to
follow up on were identified (marked "TBD (#)" in the commit):

1) Do not lose parens (e.g. `(1+2)*3`) when roundtripping
   String->AST->String by using SQLNested.
2) Incorrect precedence of the NOT unary
3) `parse_table_factor` accepts any expression where a SELECT subquery
   is expected.
4) parse_delete uses parse_expr() to retrieve a table name

These are dealt with in the commits to follow.
2019-02-07 05:24:54 +03:00
Nickolay Ponomarev
2dec65fdb4 Separate statement from expr parsing (4/5)
Continuing from https://github.com/andygrove/sqlparser-rs/pull/33#issuecomment-453060427

This stops the parser from accepting (and the AST from being able to
represent) SQL look-alike code that makes no sense, e.g.

    SELECT ... FROM (CREATE TABLE ...) foo
    SELECT ... FROM (1+CAST(...)) foo

Generally this makes the AST less "partially typed": meaning certain
parts are strongly typed (e.g. SELECT can only contain projections,
relations, etc.), while everything that didn't get its own type is
dumped into ASTNode, effectively untyped. After a few more fixes (yet
to be implemented), `ASTNode` could become an `SQLExpression`. The
Pratt-style expression parser (returning an SQLExpression) would be
invoked from the top-down parser in places where a generic expression
is expected (e.g. after SELECT <...>, WHERE <...>, etc.), while things
like select's `projection` and `relation` could be more appropriately
(narrowly) typed.


Since the diff is quite large due to necessarily large number of
mechanical changes, here's an overview:

1) Interface changes:

   - A new AST enum - `SQLStatement` - is split out of ASTNode:

     - The variants of the ASTNode enum, which _only_ make sense as a top
       level statement (INSERT, UPDATE, DELETE, CREATE, ALTER, COPY) are
       _moved_ to the new enum, with no other changes.
     - SQLSelect is _duplicated_: now available both as a variant in
       SQLStatement::SQLSelect (top-level SELECT) and ASTNode:: (subquery).

   - The main entry point (Parser::parse_sql) now expects an SQL statement
     as input, and returns an `SQLStatement`.

2) Parser changes: instead of detecting the top-level constructs deep
down in the precedence parser (`parse_prefix`) we are able to do it
just right after setting up the parser in the `parse_sql` entry point

(SELECT, again, is kept in the expression parser to demonstrate how
subqueries could be implemented).

The rest of parser changes are mechanical ASTNode -> SQLStatement
replacements resulting from the AST change.

3) Testing changes: for every test - depending on whether the input was
a complete statement or an expresssion -  I used an appropriate helper
function:

   - `verified` (parses SQL, checks that it round-trips, and returns
     the AST) - was replaced by `verified_stmt` or `verified_expr`.

   - `parse_sql` (which returned AST without checking it round-tripped)
     was replaced by:

     - `parse_sql_expr` (same function, for expressions)

     - `one_statement_parses_to` (formerly `parses_to`), extended to
       deal with statements that are not expected to round-trip.
       The weird name is to reduce further churn when implementing
       multi-statement parsing.

     - `verified_stmt` (in 4 testcases that actually round-tripped)
2019-01-31 15:54:57 +03:00
Nickolay Ponomarev
d8173d4196 Extract ASTNode::SQLSelect to a separate struct (1/5)
This will allow re-using it for SQLStatement in a later commit.

(Also split the new struct into a separate file, other query-related
types will be moved here in a follow-up commit.)
2019-01-31 03:57:17 +03:00
Andy Grove
e863bc041c cargo fmt, fix compiler warnings 2018-12-16 13:57:01 -07:00
Clemens Winter
91aa985ed0 Add LIKE operator 2018-12-16 11:26:09 -08:00
Andy Grove
335607f6bb Add placeholder unit test for ANSI parser 2018-10-06 10:37:49 -06:00