Specifically, `FOREIGN KEY REFERENCES <foreign_table> (<referred_columns>)`
can now be followed by `ON DELETE <referential_action>` and/or by
`ON UPDATE <referential_action>`.
The Ident type was previously an alias for a String. Turn it into a full
fledged struct, so that the parser can preserve the distinction between
identifier value and quote style already made by the tokenizer's Word
structure.
The SQL standard requires that numeric literals with a decimal point,
like 1.23, are represented exactly, up to some precision. That means
that parsing these literals into f64s is invalid, as it is impossible
to represent many decimal numbers exactly in binary floating point (for
example, 0.3).
This commit parses all numeric literals into a new `Value` variant
`Number(String)`, removing the old `Long(u64)` and `Double(f64)`
variants. This is slightly less convenient for downstream consumers, but
far more flexible, as numbers that do not fit into a u64 and f64 are now
representable.
I realized a moment too late that I'd missed a type name in
when removing the "SQL" prefix from types in ac555d7e8. As far as I can
tell, this was the only oversight.
The rationale here is the same as the last commit: since this crate
exclusively parses SQL, there's no need to restate that in every type
name. (The prefix seems to be an artifact of this crate's history as a
submodule of Datafusion, where it was useful to explicitly call out
which types were related to SQL parsing.)
This commit has the additional benefit of making all type names
consistent; over type we'd added some types which were not prefixed with
"SQL".
The ASTNode enum was confusingly named. In the past, the name made
sense, as the enum contained nearly all of the nodes in the AST, but
over time, pieces have been split into different structs, like
SQLStatement and SQLQuery. The ASTNode enum now contains only contains
expression nodes, so Expr is a better name.
Also rename the UnnamedExpression and ExpressionWithAlias variants
of SQLSelectItem to UnnamedExpr and ExprWithAlias, respectively, to
match the new shorthand for the word "expression".
T-SQL (and Oracle) support non-standard syntax, which is similar in
functionality to LATERAL joins in ANSI and PostgreSQL
<https://blog.jooq.org/tag/lateral-derived-table/>: it allows to use
the columns from the tables defined to the left of `APPLY` in the
"derived tables" (subqueries) to the right of `APPLY`. Unlike ANSI
LATERAL (but like Postgres' implementation), APPLY is also used with
table-valued function calls.
Despite them being similar, we represent "APPLY" joins with
`JoinOperator`s of its own (`CrossApply` and `OuterApply`). Doing
otherwise seemed like it would cause unnecessary confusion, as those
interested in dialect-specific parsing would probably not expect APPLY
being parsed as LATERAL, and those wanting to forbid non-standard SQL
would not be helped by this either.
This also renames existing JoinOperator::Cross -> CrossJoin to avoid
confusion with CrossApply.
`SELECT * FROM a OUTER JOIN b` was previously being parsed as an inner
join where table `a` was aliased to `OUTER`. This is extremely
surprising, as the user likely intended to say FULL OUTER JOIN. Since
the SQL specification lists OUTER as a keyword, we are well within our
rights to return an error here.
The SQL specification prohibits constructions like
SELECT * FROM a NATURAL JOIN (b)
where b sits alone inside parentheses. Parentheses in a FROM entry
always introduce either a derived table or a join.
This commit adds support for derived tables (i.e., subqueries) that
incorporate set operations, like:
SELECT * FROM (((SELECT 1) UNION (SELECT 2)) t1 AS NATURAL JOIN t2)
This introduces a bit of complexity around determining whether a left
paren starts a subquery, starts a nested join, or belongs to an
already-started subquery. The details are explained in a comment within
the patch.
It is useful downstream to have two separate enums, one for unary
operators and one for binary operators, so that the compiler can check
exhaustiveness. Otherwise downstream consumers need to manually encode
which operators are unary and which operators are binary when matching
on an Operator enum.
These were previously called "BinaryExpr" and "Unary"; besides being
inconsistent, it's also not correct to say "binary expression" or "unary
expression", as it's the operators that have arities, not the
expression. Adjust the naming of the variants accordingly.
SELECT * FROM (((SELECT 1))) is just as valid as
SELECT * FROM (SELECT 1). Add a test to ensure that we can parse the
first form.
Addresses a comment from #100.
get_next_precedence deals with left-binding power, not right binding
power. Therefore, when it encounters a standalone NOT operator (i.e., a
"NOT" token that is not followed by "BETWEEN", "LIKE", or "IN"), it
should return 0, because unary NOT is not an infix operator, it's a
prefix operator, and therefore it has no left-binding power.
Standardize the license header, removing the Grove Enterprise copyright
notice where it exists per #58. Also add a CI check to ensure that files
without license headers don't get merged.
Fix#58.