...as in `FROM foo bar WHERE bar.x > 1`.
To avoid ambiguity as to whether a token is an alias or a keyword, we
maintain a blacklist of keywords, that can follow a "table factor", to
prevent parsing them as an alias. This "context-specific reserved
keyword" approach lets us accept more SQL that's valid in some dialects,
than a list of globally reserved keywords. Also some dialects (e.g.
Oracle) apparently don't reserve some keywords (like JOIN), while
presumably they won't accept them as an alias (`FROM foo JOIN` meaning
`FROM foo AS JOIN`).
A "table factor" (name borrowed from the ANSI SQL grammar) is a table
name or a derived table (subquery), followed by an optional `AS` and an
optional alias. (The alias is *not* optional for subqueries, but we
don't enforce that.) It can appear in the FROM/JOIN part of the query.
This commit:
- introduces ASTNode::TableFactor
- changes the parser to populate SQLSelect::relation and Join::relation
with ASTNode::TableFactor instead of the table name
- changes the parser to only accept subqueries or identifiers, not
arbitrary expressions in the "table factor" context
- always parses the table name as SQLCompoundIdentifier (whether or not
it was actually compound).
Fold Token::{Keyword, Identifier, DoubleQuotedString} into one
Token::SQLWord, which has the necessary information (was it a
known keyword and/or was it quoted).
This lets the parser easily accept DoubleQuotedString (a quoted
identifier) everywhere it expects an Identifier in the same match
arm. (To complete support of quoted identifiers, or "delimited
identifiers" as the spec calls them, a TODO in parse_tablename()
ought to be addressed.)
As an aside, per <https://en.wikibooks.org/wiki/SQL_Dialects_Reference/Data_structure_definition/Delimited_identifiers>
sqlite seems to be the only one supporting 'identifier'
(which is rather hairy, since it can also be a string
literal), and `identifier` seems only to be supported by
MySQL. I didn't implement either one.
This also allows the use of `parse`/`expect_keyword` machinery
for non-reserved keywords: previously they relied on the keyword
being a Token::Keyword, which wasn't a Token::Identifier, and so
wasn't accepted as one.
Now whether a keyword can be used as an identifier can be decided
by the parser. (I didn't add a blacklist of "reserved" keywords,
so that any keyword which doesn't have a special meaning in the
parser could be used as an identifier. The list of keywords in
the dialect could be re-used for that purpose at a later stage.)
i.e. ASC/DESC/unspecified - so that we don't lose information about
source code.
Also don't take any keyword other than ASC/DESC or Comma to mean
'ascending'.
...as this syntax is not specific to the PostgreSQL dialect.
Also use verified() to assert that parsing + serializing results in the
original SQL string.
Its existence alongside SingleQuotedString simply doesn't make sense:
`'a string'` is a string literal, while `a string` is not a "value".
It's only used in postgresql-specific tab-separated-values parser to
store the string representation of a field's value. For that use-case
Option<String> looks like a more appropriate choice than Value.
...and parser support for the corresponding token, as "..." in SQL[*] is
not a literal string like we parse it - but a quoted identifier (which I
intend to implement later).
[*] in all the RBDMSes I know, except for sqlite which has complex rules
in the name of "compatibility": https://www.sqlite.org/lang_keywords.html
Mainly by replacing `assert_eq!(sql, ast.to_string())` with a call to
the recently introduced `verified()` helper or using `parses_to()` where
the expected serialization differs from the original SQL string.
There was one case (parse_implicit_join), where the inputs were different:
let sql = "SELECT * FROM t1,t2";
//vs
let sql = "SELECT * FROM t1, t2";
and since we don't test the whitespace handling in other tests, I just
used the canonical representation as input.
Before this missing keywords THEN/WHEN/AS would be parsed as if they
were in the text as the code didn't check the return value of
consume_token() - see upcoming commit.