Document parser crate.

2025-07-25 22:14:25 +00:00 · 2023-02-07 21:42:15 +02:00 · 2023-02-07 21:42:15 +02:00 · 07918f0a9a
commit 07918f0a9a
parent e7f14ab9b8
6 changed files with 429 additions and 86 deletions
--- a/parser/src/lib.rs
+++ b/parser/src/lib.rs
@ -1,19 +1,119 @@
-//! This crate can be used to parse python sourcecode into a so
-//! called AST (abstract syntax tree).
+//! This crate can be used to parse Python source code into an Abstract
+//! Syntax Tree.
 //!
-//! The stages involved in this process are lexical analysis and
-//! parsing. The lexical analysis splits the sourcecode into
-//! tokens, and the parsing transforms those tokens into an AST.
+//! ## Overview:
 //!
-//! For example, one could do this:
+//! The process by which source code is parsed into an AST can be broken down
+//! into two general stages: [lexical analysis] and [parsing].
+//!
+//! During lexical analysis, the source code is converted into a stream of lexical
+//! tokens that represent the smallest meaningful units of the language. For example,
+//! the source code `print("Hello world")` would _roughly_ be converted into the following
+//! stream of tokens:
+//!
+//! ```text
+//! Name("print"), LeftParen, String("Hello world"), RightParen
+//! ```
+//!
+//! these tokens are then consumed by the parser, which matches them against a set of
+//! grammar rules to verify that the source code is syntactically valid and to construct
+//! an AST that represents the source code.
+//!  
+//! During parsing, the parser consumes the tokens generated by the lexer and constructs
+//! a tree representation of the source code. The tree is made up of nodes that represent
+//! the different syntactic constructs of the language. If the source code is syntactically
+//! invalid, parsing fails and an error is returned. After a successful parse, the AST can
+//! be used to perform further analysis on the source code. Continuing with the example
+//! above, the AST generated by the parser would _roughly_ look something like this:
+//!
+//! ```text
+//! node: Expr {
+//!     value: {
+//!         node: Call {
+//!             func: {
+//!                 node: Name {
+//!                     id: "print",
+//!                     ctx: Load,
+//!                 },
+//!             },
+//!             args: [
+//!                 node: Constant {
+//!                     value: Str("Hello World"),
+//!                     kind: None,
+//!                 },
+//!             ],
+//!             keywords: [],
+//!         },
+//!     },
+//! },
+//!```
+//!
+//! Note: The Tokens/ASTs shown above are not the exact tokens/ASTs generated by the parser.
+//!
+//! ## Source code layout:
+//!
+//! The functionality of this crate is split into several modules:
+//!
+//! - [token]: This module contains the definition of the tokens that are generated by the lexer.
+//! - [lexer]: This module contains the lexer and is responsible for generating the tokens.
+//! - [parser]: This module contains an interface to the parser and is responsible for generating the AST.
+//!     - Functions and strings have special parsing requirements that are handled in additional files.
+//! - [mode]: This module contains the definition of the different modes that the parser can be in.
+//! - [error]: This module contains the definition of the errors that can be returned by the parser.
+//!
+//! # Examples
+//!
+//! For example, to get a stream of tokens from a given string, one could do this:
 //!
 //! ```
-//! use rustpython_parser::{parser, ast};
+//! use rustpython_parser::lexer::make_tokenizer;
 //!
-//! let python_source = "print('Hello world')";
-//! let python_ast = parser::parse_expression(python_source, "<embedded>").unwrap();
+//! let python_source = r#"
+//! def is_odd(i):
+//!     return bool(i & 1)
+//! "#;
+//! let mut tokens = make_tokenizer(python_source);
+//! assert!(tokens.all(|t| t.is_ok()));
+//! ```
+//!
+//! These tokens can be directly fed into the parser to generate an AST:
 //!
 //! ```
+//! use rustpython_parser::parser::{parse_tokens, Mode};
+//! use rustpython_parser::lexer::make_tokenizer;
+//!
+//! let python_source = r#"
+//! def is_odd(i):
+//!    return bool(i & 1)
+//! "#;
+//! let tokens = make_tokenizer(python_source);
+//! let ast = parse_tokens(tokens, Mode::Module, "<embedded>");
+//!
+//! assert!(ast.is_ok());
+//! ```
+//!
+//! Alternatively, you can use one of the other `parse_*` functions to parse a string directly without using a specific
+//! mode or tokenizing the source beforehand:
+//!
+//! ```
+//! use rustpython_parser::parser::parse_program;
+//!
+//! let python_source = r#"
+//! def is_odd(i):
+//!   return bool(i & 1)
+//! "#;
+//! let ast = parse_program(python_source, "<embedded>");
+//!
+//! assert!(ast.is_ok());
+//! ```
+//!
+//! [lexical analysis]: https://en.wikipedia.org/wiki/Lexical_analysis
+//! [parsing]: https://en.wikipedia.org/wiki/Parsing
+//! [token]: crate::token
+//! [lexer]: crate::lexer
+//! [parser]: crate::parser
+//! [mode]: crate::mode
+//! [error]: crate::error

 #![doc(html_logo_url = "https://raw.githubusercontent.com/RustPython/RustPython/main/logo.png")]
 #![doc(html_root_url = "https://docs.rs/rustpython-parser/")]