Continuing from https://github.com/andygrove/sqlparser-rs/pull/33#issuecomment-453060427
This stops the parser from accepting (and the AST from being able to
represent) SQL look-alike code that makes no sense, e.g.
SELECT ... FROM (CREATE TABLE ...) foo
SELECT ... FROM (1+CAST(...)) foo
Generally this makes the AST less "partially typed": meaning certain
parts are strongly typed (e.g. SELECT can only contain projections,
relations, etc.), while everything that didn't get its own type is
dumped into ASTNode, effectively untyped. After a few more fixes (yet
to be implemented), `ASTNode` could become an `SQLExpression`. The
Pratt-style expression parser (returning an SQLExpression) would be
invoked from the top-down parser in places where a generic expression
is expected (e.g. after SELECT <...>, WHERE <...>, etc.), while things
like select's `projection` and `relation` could be more appropriately
(narrowly) typed.
Since the diff is quite large due to necessarily large number of
mechanical changes, here's an overview:
1) Interface changes:
- A new AST enum - `SQLStatement` - is split out of ASTNode:
- The variants of the ASTNode enum, which _only_ make sense as a top
level statement (INSERT, UPDATE, DELETE, CREATE, ALTER, COPY) are
_moved_ to the new enum, with no other changes.
- SQLSelect is _duplicated_: now available both as a variant in
SQLStatement::SQLSelect (top-level SELECT) and ASTNode:: (subquery).
- The main entry point (Parser::parse_sql) now expects an SQL statement
as input, and returns an `SQLStatement`.
2) Parser changes: instead of detecting the top-level constructs deep
down in the precedence parser (`parse_prefix`) we are able to do it
just right after setting up the parser in the `parse_sql` entry point
(SELECT, again, is kept in the expression parser to demonstrate how
subqueries could be implemented).
The rest of parser changes are mechanical ASTNode -> SQLStatement
replacements resulting from the AST change.
3) Testing changes: for every test - depending on whether the input was
a complete statement or an expresssion - I used an appropriate helper
function:
- `verified` (parses SQL, checks that it round-trips, and returns
the AST) - was replaced by `verified_stmt` or `verified_expr`.
- `parse_sql` (which returned AST without checking it round-tripped)
was replaced by:
- `parse_sql_expr` (same function, for expressions)
- `one_statement_parses_to` (formerly `parses_to`), extended to
deal with statements that are not expected to round-trip.
The weird name is to reduce further churn when implementing
multi-statement parsing.
- `verified_stmt` (in 4 testcases that actually round-tripped)
Fold Token::{Keyword, Identifier, DoubleQuotedString} into one
Token::SQLWord, which has the necessary information (was it a
known keyword and/or was it quoted).
This lets the parser easily accept DoubleQuotedString (a quoted
identifier) everywhere it expects an Identifier in the same match
arm. (To complete support of quoted identifiers, or "delimited
identifiers" as the spec calls them, a TODO in parse_tablename()
ought to be addressed.)
As an aside, per <https://en.wikibooks.org/wiki/SQL_Dialects_Reference/Data_structure_definition/Delimited_identifiers>
sqlite seems to be the only one supporting 'identifier'
(which is rather hairy, since it can also be a string
literal), and `identifier` seems only to be supported by
MySQL. I didn't implement either one.
This also allows the use of `parse`/`expect_keyword` machinery
for non-reserved keywords: previously they relied on the keyword
being a Token::Keyword, which wasn't a Token::Identifier, and so
wasn't accepted as one.
Now whether a keyword can be used as an identifier can be decided
by the parser. (I didn't add a blacklist of "reserved" keywords,
so that any keyword which doesn't have a special meaning in the
parser could be used as an identifier. The list of keywords in
the dialect could be re-used for that purpose at a later stage.)
...as this syntax is not specific to the PostgreSQL dialect.
Also use verified() to assert that parsing + serializing results in the
original SQL string.
Mainly by replacing `assert_eq!(sql, ast.to_string())` with a call to
the recently introduced `verified()` helper or using `parses_to()` where
the expected serialization differs from the original SQL string.
There was one case (parse_implicit_join), where the inputs were different:
let sql = "SELECT * FROM t1,t2";
//vs
let sql = "SELECT * FROM t1, t2";
and since we don't test the whitespace handling in other tests, I just
used the canonical representation as input.
There's no Token::Period in such situation, so fractional part (from sec) was silently truncated.
Can't uncomment the test yet, because parse_timestamp() is effectively
unused: the code added to parse_value() in 5abd9e7dec
was wrong as it attempted to handle unquoted date/time literals. One
part of it was commented out earlier, the other can't work as far as I
can see, as it tries to parse a Number token - `([0-9]|\.)+` - as a
timestamp, so I removed it as well.
1) Simplified the bit in parse_datatype()
2) Made sure it was covered by the test (the "public.year" bit)
2a) ...the rest of changes in the test are to fix incorrect variable
names: c_name/c_lat/c_lng were copy-pasted from a previous test.
3) Removed the branch from parse_pg_cast, which duplicated what
parse_data_type already handled (added in the same commit even
2007995938 )