datafusion-sqlparse

mirror of https://github.com/apache/datafusion-sqlparser-rs.git synced 2025-07-08 01:15:00 +00:00

Author	SHA1	Message	Date
Nickolay Ponomarev	2dec65fdb4	Separate statement from expr parsing (4/5) Continuing from https://github.com/andygrove/sqlparser-rs/pull/33#issuecomment-453060427 This stops the parser from accepting (and the AST from being able to represent) SQL look-alike code that makes no sense, e.g. SELECT ... FROM (CREATE TABLE ...) foo SELECT ... FROM (1+CAST(...)) foo Generally this makes the AST less "partially typed": meaning certain parts are strongly typed (e.g. SELECT can only contain projections, relations, etc.), while everything that didn't get its own type is dumped into ASTNode, effectively untyped. After a few more fixes (yet to be implemented), `ASTNode` could become an `SQLExpression`. The Pratt-style expression parser (returning an SQLExpression) would be invoked from the top-down parser in places where a generic expression is expected (e.g. after SELECT <...>, WHERE <...>, etc.), while things like select's `projection` and `relation` could be more appropriately (narrowly) typed. Since the diff is quite large due to necessarily large number of mechanical changes, here's an overview: 1) Interface changes: - A new AST enum - `SQLStatement` - is split out of ASTNode: - The variants of the ASTNode enum, which _only_ make sense as a top level statement (INSERT, UPDATE, DELETE, CREATE, ALTER, COPY) are _moved_ to the new enum, with no other changes. - SQLSelect is _duplicated_: now available both as a variant in SQLStatement::SQLSelect (top-level SELECT) and ASTNode:: (subquery). - The main entry point (Parser::parse_sql) now expects an SQL statement as input, and returns an `SQLStatement`. 2) Parser changes: instead of detecting the top-level constructs deep down in the precedence parser (`parse_prefix`) we are able to do it just right after setting up the parser in the `parse_sql` entry point (SELECT, again, is kept in the expression parser to demonstrate how subqueries could be implemented). The rest of parser changes are mechanical ASTNode -> SQLStatement replacements resulting from the AST change. 3) Testing changes: for every test - depending on whether the input was a complete statement or an expresssion - I used an appropriate helper function: - `verified` (parses SQL, checks that it round-trips, and returns the AST) - was replaced by `verified_stmt` or `verified_expr`. - `parse_sql` (which returned AST without checking it round-tripped) was replaced by: - `parse_sql_expr` (same function, for expressions) - `one_statement_parses_to` (formerly `parses_to`), extended to deal with statements that are not expected to round-trip. The weird name is to reduce further churn when implementing multi-statement parsing. - `verified_stmt` (in 4 testcases that actually round-tripped)	2019-01-31 15:54:57 +03:00
Nickolay Ponomarev	7bbf69f513	Further simplify parse_compound_identifier (5/8) This part changes behavior: - Fail when no identifier is found. - Avoid rewinding if EOF was hit right after the identifier.	2019-01-31 03:57:17 +03:00
Nickolay Ponomarev	9a8b6a8e64	Rework keyword/identifier parsing (1/8) Fold Token::{Keyword, Identifier, DoubleQuotedString} into one Token::SQLWord, which has the necessary information (was it a known keyword and/or was it quoted). This lets the parser easily accept DoubleQuotedString (a quoted identifier) everywhere it expects an Identifier in the same match arm. (To complete support of quoted identifiers, or "delimited identifiers" as the spec calls them, a TODO in parse_tablename() ought to be addressed.) As an aside, per <https://en.wikibooks.org/wiki/SQL_Dialects_Reference/Data_structure_definition/Delimited_identifiers> sqlite seems to be the only one supporting 'identifier' (which is rather hairy, since it can also be a string literal), and `identifier` seems only to be supported by MySQL. I didn't implement either one. This also allows the use of `parse`/`expect_keyword` machinery for non-reserved keywords: previously they relied on the keyword being a Token::Keyword, which wasn't a Token::Identifier, and so wasn't accepted as one. Now whether a keyword can be used as an identifier can be decided by the parser. (I didn't add a blacklist of "reserved" keywords, so that any keyword which doesn't have a special meaning in the parser could be used as an identifier. The list of keywords in the dialect could be re-used for that purpose at a later stage.)	2019-01-31 03:57:16 +03:00
Nickolay Ponomarev	70c799e21d	Use verified() in the remaining PG-specific tests	2019-01-20 19:30:13 +03:00
Nickolay Ponomarev	9441f9c5d8	Move tests for "LIKE '%'" to sqlparser_generic.rs ...as this syntax is not specific to the PostgreSQL dialect. Also use verified() to assert that parsing + serializing results in the original SQL string.	2019-01-20 19:30:12 +03:00
Nickolay Ponomarev	d5109a2880	Remove duplicate tests from sqlparser_postgres.rs These have identical copies in sqlparser_generic.rs	2019-01-20 19:30:12 +03:00
Nickolay Ponomarev	a1da7b4005	Reduce differences between "generic" and "postgresql" tests Mainly by replacing `assert_eq!(sql, ast.to_string())` with a call to the recently introduced `verified()` helper or using `parses_to()` where the expected serialization differs from the original SQL string. There was one case (parse_implicit_join), where the inputs were different: let sql = "SELECT * FROM t1,t2"; //vs let sql = "SELECT * FROM t1, t2"; and since we don't test the whitespace handling in other tests, I just used the canonical representation as input.	2019-01-20 19:14:53 +03:00
Nickolay Ponomarev	de4ccd3cb7	Fail when expected keyword is not found Add #[must_use] to warn against unchecked results of parse_keyword/s in the future.	2019-01-13 01:07:58 +03:00
Andy Grove	777fd4c2ee	Merge branch 'master' into not	2019-01-12 11:14:07 -07:00
Andy Grove	8c351fe10a	Merge branch 'join-support' of https://github.com/fredrikroos/sqlparser-rs into fredrikroos-join-support	2019-01-12 11:09:41 -07:00
Andy Grove	ab423bc9dc	Merge branch 'master' into join-support	2019-01-12 08:33:12 -07:00
Nickolay Ponomarev	3b13e153a8	Fix parse_time() handling of fractional seconds There's no Token::Period in such situation, so fractional part (from sec) was silently truncated. Can't uncomment the test yet, because parse_timestamp() is effectively unused: the code added to parse_value() in `5abd9e7dec` was wrong as it attempted to handle unquoted date/time literals. One part of it was commented out earlier, the other can't work as far as I can see, as it tries to parse a Number token - `([0-9]\|\.)+` - as a timestamp, so I removed it as well.	2019-01-11 02:37:36 +03:00
Nickolay Ponomarev	eff92a2dc1	Remove special handling of ::type1::type2 from parse_pg_cast ...it gets handled just as well by the infix parser. (Add a test while we're at it.)	2019-01-11 02:37:36 +03:00
Nickolay Ponomarev	f21cd697c3	Simplify custom datatypes handling and add a test 1) Simplified the bit in parse_datatype() 2) Made sure it was covered by the test (the "public.year" bit) 2a) ...the rest of changes in the test are to fix incorrect variable names: c_name/c_lat/c_lng were copy-pasted from a previous test. 3) Removed the branch from parse_pg_cast, which duplicated what parse_data_type already handled (added in the same commit even `2007995938` )	2019-01-11 02:37:36 +03:00
Andy Grove	ee1944b9d9	Implemented NOT LIKE	2018-12-16 16:30:32 -07:00
Andy Grove	e863bc041c	cargo fmt, fix compiler warnings	2018-12-16 13:57:01 -07:00
Clemens Winter	91aa985ed0	Add LIKE operator	2018-12-16 11:26:09 -08:00
Fredrik Roos	72024661a9	More tests and some small bugfixes	2018-11-18 00:53:39 +01:00
Andy Grove	7e152cd0a9	revert one timestamp parsing case	2018-10-14 12:26:47 -06:00
Andy Grove	035ef52696	re-instate tests for generic parser	2018-10-06 10:15:10 -06:00

1 2 3 4

170 commits