Commit graph

432 commits

Author SHA1 Message Date
Nikhil Benesch
38fc920d7c
Parse DECIMAL and DEC aliases for NUMERIC type 2019-06-02 22:40:18 -04:00
Nickolay Ponomarev
d0f2de06ed Refactor parse_joins, pt.2: implicit/cross/natural joins
- reduce duplication in the handling of implicit/cross joins and make
  the flow of data slightly clearer by returning the `join` instead of
  pushing it and exiting early.

  (I wanted the block that currently returns `join` to return one of
  JoinOperator::* tags, so that `parse_table_factor` and the construction
  of the `Join` struct could happen after we've parsed the JOIN keywords,
  but that seems impossible.)

- move the check for the NATURAL keyword into the block that deals with 
  INNER/OUTER joins that support constraints (and thus can be preceded
  by "NATURAL")

- add a check for NATURAL not followed by a known join type with a test

- add more tests for NATURAL joins (we didn't have any), and fix
  whitespace bug in `to_string()` that was uncovered (we emitted an
  extra space: `foo NATURAL JOIN bar `)
2019-06-03 02:44:03 +03:00
Nickolay Ponomarev
665b9df729 Refactor parse_joins, pt.1: INNER/OUTER joins
This block parses one of:
- `[ INNER ] JOIN <table_factor> <join_constraint>`
- `{ LEFT | RIGHT | FULL } [ OUTER ] JOIN <table_factor> <join_constraint>`

..but it was hard to see because of the duplication.
2019-06-03 02:44:03 +03:00
Nickolay Ponomarev
8206523416 Minor consume_token()-related simplifications
- use `if !self.consume_token(&Token::Comma) { break; }` to consume the
  comma and exit the loop if no comma found.
- coalesce two `{ false }` blocks in `consume_token` by using a match guard
2019-06-03 02:44:03 +03:00
Nickolay Ponomarev
e02625719e Allow calling prev_token() after EOF
Before this `next_token()` would only increment the index when returning
`Some(token)`. This means that the caller wishing to rewind must be
careful not to call `prev_token()` on EOF (`None`), while not forgetting
to call it for `Some`. Not doing this resulted in bugs in the
undertested code that does error handling.

After making this mistake several times, I'm changing `next_token()` /
`prev_token()` so that calling `next_token(); prev_token(); next_token()`
returns the same token in the first and the last invocation.
2019-06-02 22:57:44 +03:00
Nickolay Ponomarev
1227fddd48 Reduce cloning of tokens
- Avoid cloning whitespace tokens in `peek_nth_token()` by using a
  &Token from `tokens.get()` instead of a cloned `Token` from `token_at()`

- Similarly avoid cloning in `next_token_no_skip`, and clone the
  non-whitespace tokens in `next_token` instead.

- Remove `token_at`, which was only used in `peek_token` and
  `peek_nth_token`

- Fold `prev_token_no_skip()` into `prev_token()` and make `prev_token`
  return nothing, as the return value isn't used anyway.
2019-06-02 22:57:44 +03:00
Nickolay Ponomarev
ebb82b8c8f
Merge pull request #65 from nickolay/pr/ddl-improvements
* Rewrite parsing of `ALTER TABLE ADD CONSTRAINT`
* Support constraints in CREATE TABLE
* Change `Value::Long()` to be unsigned, use u64 consistently
* Allow trailing comma in CREATE TABLE
2019-06-02 20:53:21 +03:00
Nikhil Benesch
1cc9d2d6f5
Merge pull request #82 from benesch/not-prec
Fix the precedence of NOT LIKE
2019-06-02 10:49:49 -04:00
Nickolay Ponomarev
d9edc2588b Change Value::Long() to u64, use u64 instead of usize
The tokenizer emits a separate Token for +/- signs, so the value of
Value::Long() (as well as of parse_literal_int()) may never be negative.

Also we have been using both u64 and usize to represent a parsed
unsigned number. Change to using u64 for consistency.
2019-06-02 13:58:14 +03:00
Nickolay Ponomarev
0407ed2b57 Allow trailing comma in CREATE TABLE
At least MSSQL supports it, not sure about others.
2019-06-02 13:54:16 +03:00
Nickolay Ponomarev
8569a61fd0 Rename AlterOperation -> AlterTableOperation
Since other ALTER statements will have separate sub-commands.
2019-06-02 13:54:16 +03:00
Nickolay Ponomarev
aab0c36443 Support parsing constraints in CREATE TABLE
<table element> ::= ... | <table constraint definition> | ...
https://jakewheat.github.io/sql-overview/sql-2011-foundation-grammar.html#table-element-list
2019-06-02 13:54:16 +03:00
Nickolay Ponomarev
c69a1881c7 Change ALTER TABLE constraints parsing
- merge PrimaryKey and UniqueKey variants
- support `CHECK` constraints, removing the separate `Key` struct
- make `CONSTRAINT constraint_name` optional
- remove `KEY` without qualifiers (wasn't parsed and there doesn't
  appear to be such a thing)
- change `UNIQUE KEY` -> `UNIQUE`
- change `REMOVE CONSTRAINT` -> `DROP CONSTRAINT` and note its parsing
  is not implemented

Spec:
- ANSI SQL: see <table constraint definition> in https://jakewheat.github.io/sql-overview/sql-2011-foundation-grammar.html#_11_6_table_constraint_definition
- Postgres: look for "and table_constraint is:" in https://www.postgresql.org/docs/11/sql-altertable.html
2019-06-02 13:54:11 +03:00
Nickolay Ponomarev
93c9000102 [mssql] Support single-quoted column aliases 2019-06-02 13:48:14 +03:00
Nickolay Ponomarev
d0a782d8cc [mssql] Support delimited identifiers in [square brackets]
T-SQL supports non-standard `[...]` quoting in addition to the widely
supported and standard `"..."`:

https://docs.microsoft.com/en-us/sql/relational-databases/databases/database-identifiers?view=sql-server-2017
2019-06-02 13:48:14 +03:00
Nikhil Benesch
90bcf55a6a
Fix the precedence of NOT LIKE
NOT LIKE has the same precedence as the LIKE operator. The parser was
previously assigning it the precedence of the unary NOT operator. NOT
BETWEEN and NOT IN are treated similarly, as they are equivalent, from a
precedence perspective, to NOT LIKE.

The fix for this requires associating precedences with sequences of
tokens, rather than single tokens, so that "NOT LIKE" and "NOT <expr>"
can have different preferences. Perhaps surprisingly, this change is not
very invasive.

An alternative I considered involved adjusting the tokenizer to lex
NOT, NOT LIKE, NOT BETWEEN, and NOT IN as separate tokens. This broke
symmetry in strange ways, though, as NotLike, NotBetween, and NotIn
gained dedicated tokens, while LIKE, BETWEEN, and IN remained as
stringly identifiers.

Fixes #81.
2019-06-01 02:52:18 -04:00
Nikhil Benesch
f55e3d5305
Introduce a peek_nth_token method
This will be used in a future commit, where looking ahead by two tokens
is important.
2019-05-31 18:18:28 -04:00
Justin Haug
2d00ea7187
Add lateral derived support 2019-05-31 18:10:25 -04:00
Justin Haug
fe10fac0ad
Add FETCH and OFFSET support 2019-05-31 18:10:24 -04:00
Nikhil Benesch
202464a06a
Merge pull request #68 from ivanceras/master
Add LIMIT as RESERVED_FOR_TABLE_ALIAS
2019-05-31 18:08:26 -04:00
Nickolay Ponomarev
d80f9f3a7a
Merge pull request #80 from benesch/between-expr
Support nested expressions in BETWEEN
2019-05-30 02:37:06 +03:00
Nickolay Ponomarev
646479e56c
Merge pull request #77 from benesch/count-distinct
Support COUNT(DISTINCT x) and similar
2019-05-30 02:35:49 +03:00
Nickolay Ponomarev
86a2fbd8e4
Merge pull request #76 from benesch/select-all
Support SELECT ALL
2019-05-30 02:35:18 +03:00
Nickolay Ponomarev
7a6a66bdc5
Merge pull request #75 from benesch/drop
Support DROP [TABLE|VIEW]
2019-05-30 02:33:33 +03:00
Nickolay Ponomarev
80f594e8d3
Merge pull request #73 from benesch/option-vec
Replace Option<Vec<T>> with Vec<T>
2019-05-30 02:32:35 +03:00
Jamie Brandon
72ced4bffe
Support COUNT(DISTINCT x) and similar 2019-05-28 16:59:05 -04:00
Nikhil Benesch
ba21ce9d37
Support nested expressions in BETWEEN
`BETWEEN <thing> AND <thing>` allows <thing> to be any expr that doesn't
contain boolean operators. (Allowing boolean operators would wreak
havoc, because of the repurposing of AND as both a boolean operation
and part of the syntax of BETWEEN.)
2019-05-28 16:42:11 -04:00
Nikhil Benesch
187376e657
Support DROP [TABLE|VIEW]
Co-authored-by: Jamie Brandon <jamie@scattered-thoughts.net>
2019-05-26 19:57:33 -04:00
Nikhil Benesch
373a9265a2
Implement std::error::Error for ParserError 2019-05-26 18:59:43 -04:00
Jamie Brandon
55fc8c5a57
Support SELECT ALL
Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
2019-05-26 18:57:04 -04:00
Nikhil Benesch
5652b4676c
Replace Option<Vec<T>> with Vec<T>
Vectors can already represent the absence of any arguments (i.e., by
being empty), so there is no need to wrap them in an Option.
2019-05-22 11:42:28 -04:00
Jamie Brandon
143846d333
Don't panic on weird infix garbage
Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
2019-05-21 11:59:45 -04:00
Jovansonlee Cesar
24f3d06231 Restore the original spacing in sqlparser.rs 2019-05-18 10:59:52 +08:00
Jovansonlee Cesar
d263d285e2 Add LIMIT as RESERVED_FOR_TABLE_ALIAS, this closes Issue#67 2019-05-18 10:53:19 +08:00
Nickolay Ponomarev
908082d26f
Merge pull request #63 from nickolay/pr/refactor-keywords
Use smarter macros to avoid duplication in keywords.rs
2019-05-12 01:06:12 +03:00
Nickolay Ponomarev
5f3150e39a Add IsOptional enum to use instead of optional: bool
The meaning of `self.parse_parenthesized_column_list(false)?` was not
very obvious.
2019-05-08 03:23:44 +03:00
Nickolay Ponomarev
eeae3af6a3 Change the default serialization of "not equals" operator to <>
`!=` is not standard, though widely supported - https://stackoverflow.com/a/723426/1026
2019-05-06 22:20:29 +03:00
Nickolay Ponomarev
f93e69d1d4 Add parse_parenthesized_column_list to reduce code duplication 2019-05-06 22:20:29 +03:00
Nickolay Ponomarev
cccf7f0d8e Parse an optional column list after a CTE 2019-05-06 22:20:29 +03:00
Nickolay Ponomarev
f859c9b80e Support COLLATE in expressions
Roughly the <character factor> production - https://jakewheat.github.io/sql-overview/sql-2011-foundation-grammar.html#character-factor

If an expression is followed by the keyword `COLLATE`, it must be
followed by the collation name, which is optionally schema-qualified
identifier.

The `COLLATE` keyword is not a regular binary operator in that it can't
be "nested": `foo COLLATE bar COLLATE baz` is not valid. If you prefer
to think of it as an operator, you might say it has the highest
precedence (judging from the spec), i.e. it binds to the smallest valid
expression to the left of it (so in `foo < bar COLLATE c`, the COLLATE
is applied first).
2019-05-06 22:20:29 +03:00
Nickolay Ponomarev
e7949d493c Reduce indentation in parse_prefix() 2019-05-06 22:20:29 +03:00
Nickolay Ponomarev
ed20b8dde8 Use smarter macros to avoid duplication in keywords.rs 2019-05-04 16:58:58 +03:00
Nickolay Ponomarev
304710d59a Add MSSQL dialect and fix up the postgres' identifier rules
The `@@version` test is MS' dialect of SQL, it seems, so test it with
its own dialect.

Update the rules for identifiers in Postresql dialect per documentation,
while we're at it. The current identifier rules in Postgresql dialect
were introduced in this commit - as a copy of generic rules, it seems:
810cd8e6cf (diff-2808df0fba0aed85f9d35c167bd6a5f1L138)
2019-05-04 01:00:13 +03:00
Nickolay Ponomarev
1347ca0825 Move the rest of tests not specific to PG from the sqlparser_postgres.rs 2019-05-04 01:00:13 +03:00
Nickolay Ponomarev
478dbe940d Factor test helpers into a common module
Also run "generic" tests with all dialects (`parse_select_version`
doesn't work with ANSI dialect, so I moved it to the postgres file
temporarily)
2019-05-04 01:00:13 +03:00
Nickolay Ponomarev
de177f107c Remove dead datetime-related code
1) Removed unused date/time parsing methods from `Parser`

I don't see how the token-based parsing code would ever be used: the
date/time literals are usually quoted, like `DATE 'yyyy-mm-dd'` or
simply `'YYYYMMDD'`, so the date will be a single token.

2) Removed unused date/time related variants from `Value` and the
dependency on `chrono`.

We don't support parsing date/time literals at the moment and when we
do I think we should store the exact String to let the consumer parse
it as they see fit.

3) Removed `parse_timestamps_example` and
`parse_timestamps_with_millis_example` tests. They parsed as
`number(2016) minus number(02) minus number(15) <END OF EXPRESSION>`
(leaving the time part unparsed) as it makes no sense to try parsing
a yyyy-mm-dd value as an SQL expression.
2019-05-04 01:00:13 +03:00
Nickolay Ponomarev
d58bbb8f9f Update doc comments
(The `SQLBetween` change is to fix a `cargo doc` warning.)
2019-05-02 21:30:32 +03:00
Nickolay Ponomarev
364f62f333 Parse table-valued functions and MSSQL-specific WITH hints
1) Table-valued functions (`FROM possibly_qualified.fn(arg1, ...)`) is
not part of ANSI SQL, but is supported in Postgres and MSSQL at least:
- "38.5.7. SQL Functions as Table Sources" <https://www.postgresql.org/docs/current/xfunc-sql.html#XFUNC-SQL-TABLE-FUNCTIONS>
- `user_defined_function` in "FROM (Transact-SQL)" <https://docs.microsoft.com/en-us/sql/t-sql/queries/from-transact-sql?view=sql-server-2017>

I've considered renaming TableFactor::Table to something else (Object?),
now that it can be a TVF, but couldn't come up with a satisfactory name.

2) "WITH hints" is MSSQL-specific syntax
<https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-table?view=sql-server-2017>

Note that MSSQL supports the following ways of specifying hints, which
are parsed with varying degrees of accuracy:
- `FROM tab (NOLOCK)` -- deprecated syntax, parsed as a function with a `NOLOCK` argument
- `FROM tab C (NOLOCK)` -- deprecated syntax, rejected ATM
- `FROM TAB C WITH (NOLOCK)` -- OK
2019-04-27 21:14:18 +03:00
Nickolay Ponomarev
e5e3d71354 Support CASE operand WHEN expected_value THEN ..
Another part of #15
2019-04-27 21:14:18 +03:00
Nickolay Ponomarev
2aa4c267e7 Simplify CASE parsing 2019-04-27 21:14:18 +03:00