datafusion-sqlparse

mirror of https://github.com/apache/datafusion-sqlparser-rs.git synced 2025-08-04 06:18:17 +00:00

Author	SHA1	Message	Date
Nickolay Ponomarev	dee30aabe0	Fix the clippy `assert!(false)` lint https://rust-lang.github.io/rust-clippy/master/index.html#assertions_on_constants While I don't feel it's valid, fixing it lets us act on the other, more useful, lints.	2019-04-21 04:46:19 +03:00
Nickolay Ponomarev	c223eaf0aa	Fix a bunch of trivial clippy lints	2019-04-21 04:46:19 +03:00
Nickolay Ponomarev	0634ec4a83	Apply suggestions from `cargo fix --edition-idioms`	2019-04-21 04:41:11 +03:00
Nickolay Ponomarev	b12a19e197	Switch to the Rust 2018 edition This requires Rust 1.31 (from last year) to build, but is otherwise compatible with the 2015-edition code.	2019-04-21 04:41:11 +03:00
Nickolay Ponomarev	bbf1805729	Support SELECT DISTINCT	2019-04-20 14:21:26 +03:00
Zhiyuan Zheng	d8f824c400	merge CreateExternalTable & CreateTable.	2019-04-14 01:05:26 +08:00
Zhiyuan Zheng	26940920ac	Add unit tests.	2019-04-09 13:28:01 +08:00
Nickolay Ponomarev	f30ab89ad2	Re-run cargo fmt	2019-03-08 15:46:40 +03:00
Nikhil Benesch	23a0fc79f5	Support CREATE MATERIALIZED VIEW	2019-03-07 13:14:33 -05:00
Nickolay Ponomarev	52e0f55b6f	Support UNION/EXCEPT/INTERSECT	2019-02-11 05:14:36 +03:00
Nickolay Ponomarev	54c9ca8619	Support unary + / -	2019-02-11 05:13:48 +03:00
Nickolay Ponomarev	786b1cf18a	Support BETWEEN	2019-02-11 05:13:48 +03:00
Nickolay Ponomarev	264319347d	Support IN	2019-02-11 05:13:48 +03:00
Nickolay Ponomarev	bed03abe44	Support `AS` and qualified wildcards in SELECT	2019-02-11 05:13:48 +03:00
Nickolay Ponomarev	bf0c07bb1b	Support basic CTEs (`WITH`) Some unsupported features are noted as TODOs.	2019-02-11 05:13:48 +03:00
Nickolay Ponomarev	35dd9342e2	Support national string literals (N'...') Widely used in MS SQL and specified in ANSI.	2019-02-07 05:34:33 +03:00
Nickolay Ponomarev	0c0cbcaff4	Support basic CREATE VIEW	2019-02-07 05:34:12 +03:00
Nickolay Ponomarev	6b107065ac	Switch some tests to `verified_select_stmt` (the tests affected by "unboxing" in the previous commits.)	2019-02-07 05:33:54 +03:00
Nickolay Ponomarev	e3b981a0e2	Don't Box<ASTNode> in SQLSelect Instead change ASTNode::SQLSubquery to be Box<SQLSelect>	2019-02-07 05:33:51 +03:00
Nickolay Ponomarev	c5bbfc33fd	Don't Box<ASTNode> in SQLStatement This used to be needed when it was a variant in the ASTNode enum itself.	2019-02-07 05:33:46 +03:00
Nickolay Ponomarev	3619e89e9c	Remove Box<> from SQLOrderByExpr It was probably copied from somewhere else when most types were variants in ASTNode, and needed Box<> to prevent recursion in the ASTNode definition.	2019-02-07 05:33:43 +03:00
Nickolay Ponomarev	9967031cba	Move TableFactor to be a separate enum ASTNode can now be renamed SQLExpression, as it represents a node in the "expression" part of the AST -- other nodes have their own types.	2019-02-07 05:33:41 +03:00
Nickolay Ponomarev	e0ceacd1ad	Store original, quoted form in SQLIdent Also move more things to use SQLIdent instead of String in the hope of making it a newtype eventually. Add tests that quoted identifiers round-trip parsing/serialization correctly.	2019-02-07 05:33:12 +03:00
Nickolay Ponomarev	07790fe4c4	Improve DELETE FROM parsing (4.4/4.4) Store (and parse) `table_name: SQLObjectName` instead of `relation: Option<Box<ASTNode>>`, which can be an arbitrary expression. Also remove the `Option<>`: the table name is not optional in any dialects I'm familiar with. While the FROM keyword itself _is_ optional in some dialects, there are more things to implement for those dialects, see https://stackoverflow.com/a/4484271/1026	2019-02-07 05:31:51 +03:00
Nickolay Ponomarev	39e98cb11a	Rename parse_tablename -> parse_object_name (4.2/4.4) ...to match the name of the recently introduced `SQLObjectName` struct and to avoid any reservations about using it with multi-part names of objects other than tables (as in the `type_name` case).	2019-02-07 05:31:44 +03:00
Nickolay Ponomarev	523f086be7	Introduce SQLObjectName struct (4.1/4.4) (To store "A name of a table, view, custom type, etc., possibly multi-part, i.e. db.schema.obj".) Before this change - some places used `String` for this (these are updated in this commit) - while others (notably SQLStatement::SQLDelete::relation, which is the reason for this series of commits) relied on ASTNode::SQLCompoundIdentifier (which is also backed by a Vec<SQLIdent>, but, as a variant of ASTNode enum, is not convenient to use when you know you need that specific variant).	2019-02-07 05:31:40 +03:00
Nickolay Ponomarev	215820ef66	Stricter parsing for subqueries (3/4) This makes the parser more strict when handling SELECTs nested somewhere in the main statement: 1) instead of accepting SELECT anywhere in the expression where an operand was expected, we only accept it inside parens. (I've added a test for the currently supported syntax, <scalar subquery> in ANSI SQL terms) 2) instead of accepting any expression in the derived table context: `FROM ( ... )` - we only look for a SELECT subquery there. Due to #1, I had to swith the 'ansi' test from invoking the expression parser to the statement parser.	2019-02-07 05:31:36 +03:00
Nickolay Ponomarev	82dc581639	Fix precedence for the NOT operator (2/4) I checked the docs of a few of the most popular RDBMSes, and it seems there's consensus that the precedence of `NOT` is higher than `AND`, but lower than `IS NULL`. Postgresql[1], Oracle[2] and MySQL[3] docs say that explicitly. T-SQL docs[4] do mention it's higher than `AND`, and while they don't explicitly mention IS NULL, this snippet: select * from (select 1 as a)x where (not x.a) is null ...is a parsing error, while the following works like IS NOT NULL: select * from (select 1 as a)x where not x.a is null sqlite doesn't seem to mention `NOT` precedence, but I assume it works similarly. [1] https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-OPERATORS [2] https://docs.oracle.com/cd/B19306_01/server.102/b14200/conditions001.htm#i1034834 [3] https://dev.mysql.com/doc/refman/8.0/en/operator-precedence.html [4] https://docs.microsoft.com/en-us/sql/t-sql/language-elements/operator-precedence-transact-sql?view=sql-server-2017	2019-02-07 05:24:55 +03:00
Nickolay Ponomarev	29db619792	Stop losing parens when roundtripping (1/4) Before this change an expression like `(a+b)-(c+d)` was parsed correctly (as a Minus node with two Plus nodes as children), but when serializing back to an SQL string, it came up as a+b-c+d, since we don't store parens in AST and don't attempt to insert them when necessary during serialization. The latter would be hard, and we already had an SQLNested enum variant, so I changed the code to wrap the AST node for the parenthesized expression in it.	2019-02-07 05:24:54 +03:00
Nickolay Ponomarev	b57c60a78c	Only use parse_expr() when we expect an expression (0/4) Before this commit there was a single `parse_expr(u8)` method, which was called both 1) from within the expression parser (to parse subexpression consisting of operators with higher priority than the current one), and 2) from the top-down parser both a) to parse true expressions (such as an item of the SELECT list or the condition after WHERE or after ON), and b) to parse sequences which are not exactly "expressions". This starts cleaning this up by renaming the `parse_expr(u8)` method to `parse_subexpr()` and using it only for (1) - i.e. usually providing a non-zero precedence parameter. The non-intuitively called `parse()` method is renamed to `parse_expr()`, which became available and is used for (2a). While reviewing the existing callers of `parse_expr`, four points to follow up on were identified (marked "TBD (#)" in the commit): 1) Do not lose parens (e.g. `(1+2)*3`) when roundtripping String->AST->String by using SQLNested. 2) Incorrect precedence of the NOT unary 3) `parse_table_factor` accepts any expression where a SELECT subquery is expected. 4) parse_delete uses parse_expr() to retrieve a table name These are dealt with in the commits to follow.	2019-02-07 05:24:54 +03:00
Nickolay Ponomarev	707c58ad57	Support parsing of multiple statements (5/5) Parser::parse_sql() can now parse a semicolon-separated list of statements, returning them in a Vec<SQLStatement>. To support this we: - Move handling of inter-statement tokens from the end of individual statement parsers (`parse_select` and `parse_delete`; this was not implemented for other top-level statements) to the common statement-list parsing code (`parse_sql`); - Change the "Unexpected token at end of ..." error, which didn't have tests and prevented us from parsing successive statements -> "Expected end of statement" (i.e. a delimiter - currently only ";" - or the EOF); - Add PartialEq on ParserError to be able to assert_eq!() that parsing statements that do not terminate properly returns an expected error.	2019-02-07 05:24:54 +03:00
Nickolay Ponomarev	5a0e0ec928	Simplify some tests by introducing `verified_select_stmt` and `expr_from_projection` (The primary motivation was that it makes the tests more resilient to the upcoming changes to the SQLSelectStatement to support `AS` aliases and `UNION`.) Also start using `&'static str` literals consistently instead of String::from for the `let sql` test strings.	2019-02-07 05:24:54 +03:00
Nickolay Ponomarev	2dec65fdb4	Separate statement from expr parsing (4/5) Continuing from https://github.com/andygrove/sqlparser-rs/pull/33#issuecomment-453060427 This stops the parser from accepting (and the AST from being able to represent) SQL look-alike code that makes no sense, e.g. SELECT ... FROM (CREATE TABLE ...) foo SELECT ... FROM (1+CAST(...)) foo Generally this makes the AST less "partially typed": meaning certain parts are strongly typed (e.g. SELECT can only contain projections, relations, etc.), while everything that didn't get its own type is dumped into ASTNode, effectively untyped. After a few more fixes (yet to be implemented), `ASTNode` could become an `SQLExpression`. The Pratt-style expression parser (returning an SQLExpression) would be invoked from the top-down parser in places where a generic expression is expected (e.g. after SELECT <...>, WHERE <...>, etc.), while things like select's `projection` and `relation` could be more appropriately (narrowly) typed. Since the diff is quite large due to necessarily large number of mechanical changes, here's an overview: 1) Interface changes: - A new AST enum - `SQLStatement` - is split out of ASTNode: - The variants of the ASTNode enum, which _only_ make sense as a top level statement (INSERT, UPDATE, DELETE, CREATE, ALTER, COPY) are _moved_ to the new enum, with no other changes. - SQLSelect is _duplicated_: now available both as a variant in SQLStatement::SQLSelect (top-level SELECT) and ASTNode:: (subquery). - The main entry point (Parser::parse_sql) now expects an SQL statement as input, and returns an `SQLStatement`. 2) Parser changes: instead of detecting the top-level constructs deep down in the precedence parser (`parse_prefix`) we are able to do it just right after setting up the parser in the `parse_sql` entry point (SELECT, again, is kept in the expression parser to demonstrate how subqueries could be implemented). The rest of parser changes are mechanical ASTNode -> SQLStatement replacements resulting from the AST change. 3) Testing changes: for every test - depending on whether the input was a complete statement or an expresssion - I used an appropriate helper function: - `verified` (parses SQL, checks that it round-trips, and returns the AST) - was replaced by `verified_stmt` or `verified_expr`. - `parse_sql` (which returned AST without checking it round-tripped) was replaced by: - `parse_sql_expr` (same function, for expressions) - `one_statement_parses_to` (formerly `parses_to`), extended to deal with statements that are not expected to round-trip. The weird name is to reduce further churn when implementing multi-statement parsing. - `verified_stmt` (in 4 testcases that actually round-tripped)	2019-01-31 15:54:57 +03:00
Nickolay Ponomarev	d8173d4196	Extract ASTNode::SQLSelect to a separate struct (1/5) This will allow re-using it for SQLStatement in a later commit. (Also split the new struct into a separate file, other query-related types will be moved here in a follow-up commit.)	2019-01-31 03:57:17 +03:00
Nickolay Ponomarev	50b5724c39	Don't parse ORDER BY as a table alias (8/8)	2019-01-31 03:57:17 +03:00
Nickolay Ponomarev	76ec175d20	Support table aliases without `AS` (7/8) ...as in `FROM foo bar WHERE bar.x > 1`. To avoid ambiguity as to whether a token is an alias or a keyword, we maintain a blacklist of keywords, that can follow a "table factor", to prevent parsing them as an alias. This "context-specific reserved keyword" approach lets us accept more SQL that's valid in some dialects, than a list of globally reserved keywords. Also some dialects (e.g. Oracle) apparently don't reserve some keywords (like JOIN), while presumably they won't accept them as an alias (`FROM foo JOIN` meaning `FROM foo AS JOIN`).	2019-01-31 03:57:17 +03:00
Nickolay Ponomarev	536fa6e428	Support `AS` table aliases (6/8) A "table factor" (name borrowed from the ANSI SQL grammar) is a table name or a derived table (subquery), followed by an optional `AS` and an optional alias. (The alias is not optional for subqueries, but we don't enforce that.) It can appear in the FROM/JOIN part of the query. This commit: - introduces ASTNode::TableFactor - changes the parser to populate SQLSelect::relation and Join::relation with ASTNode::TableFactor instead of the table name - changes the parser to only accept subqueries or identifiers, not arbitrary expressions in the "table factor" context - always parses the table name as SQLCompoundIdentifier (whether or not it was actually compound).	2019-01-31 03:57:17 +03:00
Nickolay Ponomarev	7bbf69f513	Further simplify parse_compound_identifier (5/8) This part changes behavior: - Fail when no identifier is found. - Avoid rewinding if EOF was hit right after the identifier.	2019-01-31 03:57:17 +03:00
Nickolay Ponomarev	9a8b6a8e64	Rework keyword/identifier parsing (1/8) Fold Token::{Keyword, Identifier, DoubleQuotedString} into one Token::SQLWord, which has the necessary information (was it a known keyword and/or was it quoted). This lets the parser easily accept DoubleQuotedString (a quoted identifier) everywhere it expects an Identifier in the same match arm. (To complete support of quoted identifiers, or "delimited identifiers" as the spec calls them, a TODO in parse_tablename() ought to be addressed.) As an aside, per <https://en.wikibooks.org/wiki/SQL_Dialects_Reference/Data_structure_definition/Delimited_identifiers> sqlite seems to be the only one supporting 'identifier' (which is rather hairy, since it can also be a string literal), and `identifier` seems only to be supported by MySQL. I didn't implement either one. This also allows the use of `parse`/`expect_keyword` machinery for non-reserved keywords: previously they relied on the keyword being a Token::Keyword, which wasn't a Token::Identifier, and so wasn't accepted as one. Now whether a keyword can be used as an identifier can be decided by the parser. (I didn't add a blacklist of "reserved" keywords, so that any keyword which doesn't have a special meaning in the parser could be used as an identifier. The list of keywords in the dialect could be re-used for that purpose at a later stage.)	2019-01-31 03:57:16 +03:00
Nickolay Ponomarev	3de2a0952c	Make SQLOrderByExpr::asc tri-state i.e. ASC/DESC/unspecified - so that we don't lose information about source code. Also don't take any keyword other than ASC/DESC or Comma to mean 'ascending'.	2019-01-30 04:10:55 +03:00
Nickolay Ponomarev	70c799e21d	Use verified() in the remaining PG-specific tests	2019-01-20 19:30:13 +03:00
Nickolay Ponomarev	9441f9c5d8	Move tests for "LIKE '%'" to sqlparser_generic.rs ...as this syntax is not specific to the PostgreSQL dialect. Also use verified() to assert that parsing + serializing results in the original SQL string.	2019-01-20 19:30:12 +03:00
Nickolay Ponomarev	d5109a2880	Remove duplicate tests from sqlparser_postgres.rs These have identical copies in sqlparser_generic.rs	2019-01-20 19:30:12 +03:00
Nickolay Ponomarev	45dab0e2d4	Run all the 'generic' tests with the PostgreSqlDialect too.	2019-01-20 19:15:05 +03:00
Nickolay Ponomarev	a1da7b4005	Reduce differences between "generic" and "postgresql" tests Mainly by replacing `assert_eq!(sql, ast.to_string())` with a call to the recently introduced `verified()` helper or using `parses_to()` where the expected serialization differs from the original SQL string. There was one case (parse_implicit_join), where the inputs were different: let sql = "SELECT * FROM t1,t2"; //vs let sql = "SELECT * FROM t1, t2"; and since we don't test the whitespace handling in other tests, I just used the canonical representation as input.	2019-01-20 19:14:53 +03:00
Nickolay Ponomarev	de4ccd3cb7	Fail when expected keyword is not found Add #[must_use] to warn against unchecked results of parse_keyword/s in the future.	2019-01-13 01:07:58 +03:00
Andy Grove	777fd4c2ee	Merge branch 'master' into not	2019-01-12 11:14:07 -07:00
Andy Grove	8c351fe10a	Merge branch 'join-support' of https://github.com/fredrikroos/sqlparser-rs into fredrikroos-join-support	2019-01-12 11:09:41 -07:00
Andy Grove	ab423bc9dc	Merge branch 'master' into join-support	2019-01-12 08:33:12 -07:00
Nickolay Ponomarev	3b13e153a8	Fix parse_time() handling of fractional seconds There's no Token::Period in such situation, so fractional part (from sec) was silently truncated. Can't uncomment the test yet, because parse_timestamp() is effectively unused: the code added to parse_value() in `5abd9e7dec` was wrong as it attempted to handle unquoted date/time literals. One part of it was commented out earlier, the other can't work as far as I can see, as it tries to parse a Number token - `([0-9]\|\.)+` - as a timestamp, so I removed it as well.	2019-01-11 02:37:36 +03:00

1 2

70 commits