mirror of
				https://github.com/python/cpython.git
				synced 2025-11-03 19:34:08 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			354 lines
		
	
	
	
		
			15 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			354 lines
		
	
	
	
		
			15 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
:mod:`parser` --- Access Python parse trees
 | 
						|
===========================================
 | 
						|
 | 
						|
.. module:: parser
 | 
						|
   :synopsis: Access parse trees for Python source code.
 | 
						|
.. moduleauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
 | 
						|
.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
 | 
						|
 | 
						|
 | 
						|
.. Copyright 1995 Virginia Polytechnic Institute and State University and Fred
 | 
						|
   L. Drake, Jr.  This copyright notice must be distributed on all copies, but
 | 
						|
   this document otherwise may be distributed as part of the Python
 | 
						|
   distribution.  No fee may be charged for this document in any representation,
 | 
						|
   either on paper or electronically.  This restriction does not affect other
 | 
						|
   elements in a distributed package in any way.
 | 
						|
 | 
						|
.. index:: single: parsing; Python source code
 | 
						|
 | 
						|
The :mod:`parser` module provides an interface to Python's internal parser and
 | 
						|
byte-code compiler.  The primary purpose for this interface is to allow Python
 | 
						|
code to edit the parse tree of a Python expression and create executable code
 | 
						|
from this.  This is better than trying to parse and modify an arbitrary Python
 | 
						|
code fragment as a string because parsing is performed in a manner identical to
 | 
						|
the code forming the application.  It is also faster.
 | 
						|
 | 
						|
.. note::
 | 
						|
 | 
						|
   From Python 2.5 onward, it's much more convenient to cut in at the Abstract
 | 
						|
   Syntax Tree (AST) generation and compilation stage, using the :mod:`ast`
 | 
						|
   module.
 | 
						|
 | 
						|
There are a few things to note about this module which are important to making
 | 
						|
use of the data structures created.  This is not a tutorial on editing the parse
 | 
						|
trees for Python code, but some examples of using the :mod:`parser` module are
 | 
						|
presented.
 | 
						|
 | 
						|
Most importantly, a good understanding of the Python grammar processed by the
 | 
						|
internal parser is required.  For full information on the language syntax, refer
 | 
						|
to :ref:`reference-index`.  The parser
 | 
						|
itself is created from a grammar specification defined in the file
 | 
						|
:file:`Grammar/Grammar` in the standard Python distribution.  The parse trees
 | 
						|
stored in the ST objects created by this module are the actual output from the
 | 
						|
internal parser when created by the :func:`expr` or :func:`suite` functions,
 | 
						|
described below.  The ST objects created by :func:`sequence2st` faithfully
 | 
						|
simulate those structures.  Be aware that the values of the sequences which are
 | 
						|
considered "correct" will vary from one version of Python to another as the
 | 
						|
formal grammar for the language is revised.  However, transporting code from one
 | 
						|
Python version to another as source text will always allow correct parse trees
 | 
						|
to be created in the target version, with the only restriction being that
 | 
						|
migrating to an older version of the interpreter will not support more recent
 | 
						|
language constructs.  The parse trees are not typically compatible from one
 | 
						|
version to another, whereas source code has always been forward-compatible.
 | 
						|
 | 
						|
Each element of the sequences returned by :func:`st2list` or :func:`st2tuple`
 | 
						|
has a simple form.  Sequences representing non-terminal elements in the grammar
 | 
						|
always have a length greater than one.  The first element is an integer which
 | 
						|
identifies a production in the grammar.  These integers are given symbolic names
 | 
						|
in the C header file :file:`Include/graminit.h` and the Python module
 | 
						|
:mod:`symbol`.  Each additional element of the sequence represents a component
 | 
						|
of the production as recognized in the input string: these are always sequences
 | 
						|
which have the same form as the parent.  An important aspect of this structure
 | 
						|
which should be noted is that keywords used to identify the parent node type,
 | 
						|
such as the keyword :keyword:`if` in an :const:`if_stmt`, are included in the
 | 
						|
node tree without any special treatment.  For example, the :keyword:`if` keyword
 | 
						|
is represented by the tuple ``(1, 'if')``, where ``1`` is the numeric value
 | 
						|
associated with all :const:`NAME` tokens, including variable and function names
 | 
						|
defined by the user.  In an alternate form returned when line number information
 | 
						|
is requested, the same token might be represented as ``(1, 'if', 12)``, where
 | 
						|
the ``12`` represents the line number at which the terminal symbol was found.
 | 
						|
 | 
						|
Terminal elements are represented in much the same way, but without any child
 | 
						|
elements and the addition of the source text which was identified.  The example
 | 
						|
of the :keyword:`if` keyword above is representative.  The various types of
 | 
						|
terminal symbols are defined in the C header file :file:`Include/token.h` and
 | 
						|
the Python module :mod:`token`.
 | 
						|
 | 
						|
The ST objects are not required to support the functionality of this module,
 | 
						|
but are provided for three purposes: to allow an application to amortize the
 | 
						|
cost of processing complex parse trees, to provide a parse tree representation
 | 
						|
which conserves memory space when compared to the Python list or tuple
 | 
						|
representation, and to ease the creation of additional modules in C which
 | 
						|
manipulate parse trees.  A simple "wrapper" class may be created in Python to
 | 
						|
hide the use of ST objects.
 | 
						|
 | 
						|
The :mod:`parser` module defines functions for a few distinct purposes.  The
 | 
						|
most important purposes are to create ST objects and to convert ST objects to
 | 
						|
other representations such as parse trees and compiled code objects, but there
 | 
						|
are also functions which serve to query the type of parse tree represented by an
 | 
						|
ST object.
 | 
						|
 | 
						|
 | 
						|
.. seealso::
 | 
						|
 | 
						|
   Module :mod:`symbol`
 | 
						|
      Useful constants representing internal nodes of the parse tree.
 | 
						|
 | 
						|
   Module :mod:`token`
 | 
						|
      Useful constants representing leaf nodes of the parse tree and functions for
 | 
						|
      testing node values.
 | 
						|
 | 
						|
 | 
						|
.. _creating-sts:
 | 
						|
 | 
						|
Creating ST Objects
 | 
						|
-------------------
 | 
						|
 | 
						|
ST objects may be created from source code or from a parse tree. When creating
 | 
						|
an ST object from source, different functions are used to create the ``'eval'``
 | 
						|
and ``'exec'`` forms.
 | 
						|
 | 
						|
 | 
						|
.. function:: expr(source)
 | 
						|
 | 
						|
   The :func:`expr` function parses the parameter *source* as if it were an input
 | 
						|
   to ``compile(source, 'file.py', 'eval')``.  If the parse succeeds, an ST object
 | 
						|
   is created to hold the internal parse tree representation, otherwise an
 | 
						|
   appropriate exception is raised.
 | 
						|
 | 
						|
 | 
						|
.. function:: suite(source)
 | 
						|
 | 
						|
   The :func:`suite` function parses the parameter *source* as if it were an input
 | 
						|
   to ``compile(source, 'file.py', 'exec')``.  If the parse succeeds, an ST object
 | 
						|
   is created to hold the internal parse tree representation, otherwise an
 | 
						|
   appropriate exception is raised.
 | 
						|
 | 
						|
 | 
						|
.. function:: sequence2st(sequence)
 | 
						|
 | 
						|
   This function accepts a parse tree represented as a sequence and builds an
 | 
						|
   internal representation if possible.  If it can validate that the tree conforms
 | 
						|
   to the Python grammar and all nodes are valid node types in the host version of
 | 
						|
   Python, an ST object is created from the internal representation and returned
 | 
						|
   to the called.  If there is a problem creating the internal representation, or
 | 
						|
   if the tree cannot be validated, a :exc:`ParserError` exception is raised.  An
 | 
						|
   ST object created this way should not be assumed to compile correctly; normal
 | 
						|
   exceptions raised by compilation may still be initiated when the ST object is
 | 
						|
   passed to :func:`compilest`.  This may indicate problems not related to syntax
 | 
						|
   (such as a :exc:`MemoryError` exception), but may also be due to constructs such
 | 
						|
   as the result of parsing ``del f(0)``, which escapes the Python parser but is
 | 
						|
   checked by the bytecode compiler.
 | 
						|
 | 
						|
   Sequences representing terminal tokens may be represented as either two-element
 | 
						|
   lists of the form ``(1, 'name')`` or as three-element lists of the form ``(1,
 | 
						|
   'name', 56)``.  If the third element is present, it is assumed to be a valid
 | 
						|
   line number.  The line number may be specified for any subset of the terminal
 | 
						|
   symbols in the input tree.
 | 
						|
 | 
						|
 | 
						|
.. function:: tuple2st(sequence)
 | 
						|
 | 
						|
   This is the same function as :func:`sequence2st`.  This entry point is
 | 
						|
   maintained for backward compatibility.
 | 
						|
 | 
						|
 | 
						|
.. _converting-sts:
 | 
						|
 | 
						|
Converting ST Objects
 | 
						|
---------------------
 | 
						|
 | 
						|
ST objects, regardless of the input used to create them, may be converted to
 | 
						|
parse trees represented as list- or tuple- trees, or may be compiled into
 | 
						|
executable code objects.  Parse trees may be extracted with or without line
 | 
						|
numbering information.
 | 
						|
 | 
						|
 | 
						|
.. function:: st2list(st, line_info=False, col_info=False)
 | 
						|
 | 
						|
   This function accepts an ST object from the caller in *st* and returns a
 | 
						|
   Python list representing the equivalent parse tree.  The resulting list
 | 
						|
   representation can be used for inspection or the creation of a new parse tree in
 | 
						|
   list form.  This function does not fail so long as memory is available to build
 | 
						|
   the list representation.  If the parse tree will only be used for inspection,
 | 
						|
   :func:`st2tuple` should be used instead to reduce memory consumption and
 | 
						|
   fragmentation.  When the list representation is required, this function is
 | 
						|
   significantly faster than retrieving a tuple representation and converting that
 | 
						|
   to nested lists.
 | 
						|
 | 
						|
   If *line_info* is true, line number information will be included for all
 | 
						|
   terminal tokens as a third element of the list representing the token.  Note
 | 
						|
   that the line number provided specifies the line on which the token *ends*.
 | 
						|
   This information is omitted if the flag is false or omitted.
 | 
						|
 | 
						|
 | 
						|
.. function:: st2tuple(st, line_info=False, col_info=False)
 | 
						|
 | 
						|
   This function accepts an ST object from the caller in *st* and returns a
 | 
						|
   Python tuple representing the equivalent parse tree.  Other than returning a
 | 
						|
   tuple instead of a list, this function is identical to :func:`st2list`.
 | 
						|
 | 
						|
   If *line_info* is true, line number information will be included for all
 | 
						|
   terminal tokens as a third element of the list representing the token.  This
 | 
						|
   information is omitted if the flag is false or omitted.
 | 
						|
 | 
						|
 | 
						|
.. function:: compilest(st, filename='<syntax-tree>')
 | 
						|
 | 
						|
   .. index::
 | 
						|
      builtin: exec
 | 
						|
      builtin: eval
 | 
						|
 | 
						|
   The Python byte compiler can be invoked on an ST object to produce code objects
 | 
						|
   which can be used as part of a call to the built-in :func:`exec` or :func:`eval`
 | 
						|
   functions. This function provides the interface to the compiler, passing the
 | 
						|
   internal parse tree from *st* to the parser, using the source file name
 | 
						|
   specified by the *filename* parameter. The default value supplied for *filename*
 | 
						|
   indicates that the source was an ST object.
 | 
						|
 | 
						|
   Compiling an ST object may result in exceptions related to compilation; an
 | 
						|
   example would be a :exc:`SyntaxError` caused by the parse tree for ``del f(0)``:
 | 
						|
   this statement is considered legal within the formal grammar for Python but is
 | 
						|
   not a legal language construct.  The :exc:`SyntaxError` raised for this
 | 
						|
   condition is actually generated by the Python byte-compiler normally, which is
 | 
						|
   why it can be raised at this point by the :mod:`parser` module.  Most causes of
 | 
						|
   compilation failure can be diagnosed programmatically by inspection of the parse
 | 
						|
   tree.
 | 
						|
 | 
						|
 | 
						|
.. _querying-sts:
 | 
						|
 | 
						|
Queries on ST Objects
 | 
						|
---------------------
 | 
						|
 | 
						|
Two functions are provided which allow an application to determine if an ST was
 | 
						|
created as an expression or a suite.  Neither of these functions can be used to
 | 
						|
determine if an ST was created from source code via :func:`expr` or
 | 
						|
:func:`suite` or from a parse tree via :func:`sequence2st`.
 | 
						|
 | 
						|
 | 
						|
.. function:: isexpr(st)
 | 
						|
 | 
						|
   .. index:: builtin: compile
 | 
						|
 | 
						|
   When *st* represents an ``'eval'`` form, this function returns true, otherwise
 | 
						|
   it returns false.  This is useful, since code objects normally cannot be queried
 | 
						|
   for this information using existing built-in functions.  Note that the code
 | 
						|
   objects created by :func:`compilest` cannot be queried like this either, and
 | 
						|
   are identical to those created by the built-in :func:`compile` function.
 | 
						|
 | 
						|
 | 
						|
.. function:: issuite(st)
 | 
						|
 | 
						|
   This function mirrors :func:`isexpr` in that it reports whether an ST object
 | 
						|
   represents an ``'exec'`` form, commonly known as a "suite."  It is not safe to
 | 
						|
   assume that this function is equivalent to ``not isexpr(st)``, as additional
 | 
						|
   syntactic fragments may be supported in the future.
 | 
						|
 | 
						|
 | 
						|
.. _st-errors:
 | 
						|
 | 
						|
Exceptions and Error Handling
 | 
						|
-----------------------------
 | 
						|
 | 
						|
The parser module defines a single exception, but may also pass other built-in
 | 
						|
exceptions from other portions of the Python runtime environment.  See each
 | 
						|
function for information about the exceptions it can raise.
 | 
						|
 | 
						|
 | 
						|
.. exception:: ParserError
 | 
						|
 | 
						|
   Exception raised when a failure occurs within the parser module.  This is
 | 
						|
   generally produced for validation failures rather than the built-in
 | 
						|
   :exc:`SyntaxError` raised during normal parsing. The exception argument is
 | 
						|
   either a string describing the reason of the failure or a tuple containing a
 | 
						|
   sequence causing the failure from a parse tree passed to :func:`sequence2st`
 | 
						|
   and an explanatory string.  Calls to :func:`sequence2st` need to be able to
 | 
						|
   handle either type of exception, while calls to other functions in the module
 | 
						|
   will only need to be aware of the simple string values.
 | 
						|
 | 
						|
Note that the functions :func:`compilest`, :func:`expr`, and :func:`suite` may
 | 
						|
raise exceptions which are normally raised by the parsing and compilation
 | 
						|
process.  These include the built in exceptions :exc:`MemoryError`,
 | 
						|
:exc:`OverflowError`, :exc:`SyntaxError`, and :exc:`SystemError`.  In these
 | 
						|
cases, these exceptions carry all the meaning normally associated with them.
 | 
						|
Refer to the descriptions of each function for detailed information.
 | 
						|
 | 
						|
 | 
						|
.. _st-objects:
 | 
						|
 | 
						|
ST Objects
 | 
						|
----------
 | 
						|
 | 
						|
Ordered and equality comparisons are supported between ST objects. Pickling of
 | 
						|
ST objects (using the :mod:`pickle` module) is also supported.
 | 
						|
 | 
						|
 | 
						|
.. data:: STType
 | 
						|
 | 
						|
   The type of the objects returned by :func:`expr`, :func:`suite` and
 | 
						|
   :func:`sequence2st`.
 | 
						|
 | 
						|
ST objects have the following methods:
 | 
						|
 | 
						|
 | 
						|
.. method:: ST.compile(filename='<syntax-tree>')
 | 
						|
 | 
						|
   Same as ``compilest(st, filename)``.
 | 
						|
 | 
						|
 | 
						|
.. method:: ST.isexpr()
 | 
						|
 | 
						|
   Same as ``isexpr(st)``.
 | 
						|
 | 
						|
 | 
						|
.. method:: ST.issuite()
 | 
						|
 | 
						|
   Same as ``issuite(st)``.
 | 
						|
 | 
						|
 | 
						|
.. method:: ST.tolist(line_info=False, col_info=False)
 | 
						|
 | 
						|
   Same as ``st2list(st, line_info, col_info)``.
 | 
						|
 | 
						|
 | 
						|
.. method:: ST.totuple(line_info=False, col_info=False)
 | 
						|
 | 
						|
   Same as ``st2tuple(st, line_info, col_info)``.
 | 
						|
 | 
						|
 | 
						|
Example: Emulation of :func:`compile`
 | 
						|
-------------------------------------
 | 
						|
 | 
						|
While many useful operations may take place between parsing and bytecode
 | 
						|
generation, the simplest operation is to do nothing.  For this purpose, using
 | 
						|
the :mod:`parser` module to produce an intermediate data structure is equivalent
 | 
						|
to the code ::
 | 
						|
 | 
						|
   >>> code = compile('a + 5', 'file.py', 'eval')
 | 
						|
   >>> a = 5
 | 
						|
   >>> eval(code)
 | 
						|
   10
 | 
						|
 | 
						|
The equivalent operation using the :mod:`parser` module is somewhat longer, and
 | 
						|
allows the intermediate internal parse tree to be retained as an ST object::
 | 
						|
 | 
						|
   >>> import parser
 | 
						|
   >>> st = parser.expr('a + 5')
 | 
						|
   >>> code = st.compile('file.py')
 | 
						|
   >>> a = 5
 | 
						|
   >>> eval(code)
 | 
						|
   10
 | 
						|
 | 
						|
An application which needs both ST and code objects can package this code into
 | 
						|
readily available functions::
 | 
						|
 | 
						|
   import parser
 | 
						|
 | 
						|
   def load_suite(source_string):
 | 
						|
       st = parser.suite(source_string)
 | 
						|
       return st, st.compile()
 | 
						|
 | 
						|
   def load_expression(source_string):
 | 
						|
       st = parser.expr(source_string)
 | 
						|
       return st, st.compile()
 |