mirror of
https://github.com/python/cpython.git
synced 2025-07-07 19:35:27 +00:00

Prepare the docs for using the notation used in the `python.gram` file. If we want to sync the two, the meta-syntax should be the same. Link the Full Grammar docs here; keep only a few extras. Also, remove the distinction between lexical and syntactic rules, except for whitespace handling. With f- and t-strings, the line between the two is blurry. Co-authored-by: Blaise Pabon <blaise@gmail.com> Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com> Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com> Co-authored-by: Colin Marquardt <cmarqu42@gmail.com>
211 lines
8.7 KiB
ReStructuredText
211 lines
8.7 KiB
ReStructuredText
|
|
.. _introduction:
|
|
|
|
************
|
|
Introduction
|
|
************
|
|
|
|
This reference manual describes the Python programming language. It is not
|
|
intended as a tutorial.
|
|
|
|
While I am trying to be as precise as possible, I chose to use English rather
|
|
than formal specifications for everything except syntax and lexical analysis.
|
|
This should make the document more understandable to the average reader, but
|
|
will leave room for ambiguities. Consequently, if you were coming from Mars and
|
|
tried to re-implement Python from this document alone, you might have to guess
|
|
things and in fact you would probably end up implementing quite a different
|
|
language. On the other hand, if you are using Python and wonder what the precise
|
|
rules about a particular area of the language are, you should definitely be able
|
|
to find them here. If you would like to see a more formal definition of the
|
|
language, maybe you could volunteer your time --- or invent a cloning machine
|
|
:-).
|
|
|
|
It is dangerous to add too many implementation details to a language reference
|
|
document --- the implementation may change, and other implementations of the
|
|
same language may work differently. On the other hand, CPython is the one
|
|
Python implementation in widespread use (although alternate implementations
|
|
continue to gain support), and its particular quirks are sometimes worth being
|
|
mentioned, especially where the implementation imposes additional limitations.
|
|
Therefore, you'll find short "implementation notes" sprinkled throughout the
|
|
text.
|
|
|
|
Every Python implementation comes with a number of built-in and standard
|
|
modules. These are documented in :ref:`library-index`. A few built-in modules
|
|
are mentioned when they interact in a significant way with the language
|
|
definition.
|
|
|
|
|
|
.. _implementations:
|
|
|
|
Alternate Implementations
|
|
=========================
|
|
|
|
Though there is one Python implementation which is by far the most popular,
|
|
there are some alternate implementations which are of particular interest to
|
|
different audiences.
|
|
|
|
Known implementations include:
|
|
|
|
CPython
|
|
This is the original and most-maintained implementation of Python, written in C.
|
|
New language features generally appear here first.
|
|
|
|
Jython
|
|
Python implemented in Java. This implementation can be used as a scripting
|
|
language for Java applications, or can be used to create applications using the
|
|
Java class libraries. It is also often used to create tests for Java libraries.
|
|
More information can be found at `the Jython website <https://www.jython.org/>`_.
|
|
|
|
Python for .NET
|
|
This implementation actually uses the CPython implementation, but is a managed
|
|
.NET application and makes .NET libraries available. It was created by Brian
|
|
Lloyd. For more information, see the `Python for .NET home page
|
|
<https://pythonnet.github.io/>`_.
|
|
|
|
IronPython
|
|
An alternate Python for .NET. Unlike Python.NET, this is a complete Python
|
|
implementation that generates IL, and compiles Python code directly to .NET
|
|
assemblies. It was created by Jim Hugunin, the original creator of Jython. For
|
|
more information, see `the IronPython website <https://ironpython.net/>`_.
|
|
|
|
PyPy
|
|
An implementation of Python written completely in Python. It supports several
|
|
advanced features not found in other implementations like stackless support
|
|
and a Just in Time compiler. One of the goals of the project is to encourage
|
|
experimentation with the language itself by making it easier to modify the
|
|
interpreter (since it is written in Python). Additional information is
|
|
available on `the PyPy project's home page <https://pypy.org/>`_.
|
|
|
|
Each of these implementations varies in some way from the language as documented
|
|
in this manual, or introduces specific information beyond what's covered in the
|
|
standard Python documentation. Please refer to the implementation-specific
|
|
documentation to determine what else you need to know about the specific
|
|
implementation you're using.
|
|
|
|
|
|
.. _notation:
|
|
|
|
Notation
|
|
========
|
|
|
|
.. index:: BNF, grammar, syntax, notation
|
|
|
|
The descriptions of lexical analysis and syntax use a grammar notation that
|
|
is a mixture of
|
|
`EBNF <https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form>`_
|
|
and `PEG <https://en.wikipedia.org/wiki/Parsing_expression_grammar>`_.
|
|
For example:
|
|
|
|
.. grammar-snippet::
|
|
:group: notation
|
|
|
|
name: `letter` (`letter` | `digit` | "_")*
|
|
letter: "a"..."z" | "A"..."Z"
|
|
digit: "0"..."9"
|
|
|
|
In this example, the first line says that a ``name`` is a ``letter`` followed
|
|
by a sequence of zero or more ``letter``\ s, ``digit``\ s, and underscores.
|
|
A ``letter`` in turn is any of the single characters ``'a'`` through
|
|
``'z'`` and ``A`` through ``Z``; a ``digit`` is a single character from ``0``
|
|
to ``9``.
|
|
|
|
Each rule begins with a name (which identifies the rule that's being defined)
|
|
followed by a colon, ``:``.
|
|
The definition to the right of the colon uses the following syntax elements:
|
|
|
|
* ``name``: A name refers to another rule.
|
|
Where possible, it is a link to the rule's definition.
|
|
|
|
* ``TOKEN``: An uppercase name refers to a :term:`token`.
|
|
For the purposes of grammar definitions, tokens are the same as rules.
|
|
|
|
* ``"text"``, ``'text'``: Text in single or double quotes must match literally
|
|
(without the quotes). The type of quote is chosen according to the meaning
|
|
of ``text``:
|
|
|
|
* ``'if'``: A name in single quotes denotes a :ref:`keyword <keywords>`.
|
|
* ``"case"``: A name in double quotes denotes a
|
|
:ref:`soft-keyword <soft-keywords>`.
|
|
* ``'@'``: A non-letter symbol in single quotes denotes an
|
|
:py:data:`~token.OP` token, that is, a :ref:`delimiter <delimiters>` or
|
|
:ref:`operator <operators>`.
|
|
|
|
* ``e1 e2``: Items separated only by whitespace denote a sequence.
|
|
Here, ``e1`` must be followed by ``e2``.
|
|
* ``e1 | e2``: A vertical bar is used to separate alternatives.
|
|
It denotes PEG's "ordered choice": if ``e1`` matches, ``e2`` is
|
|
not considered.
|
|
In traditional PEG grammars, this is written as a slash, ``/``, rather than
|
|
a vertical bar.
|
|
See :pep:`617` for more background and details.
|
|
* ``e*``: A star means zero or more repetitions of the preceding item.
|
|
* ``e+``: Likewise, a plus means one or more repetitions.
|
|
* ``[e]``: A phrase enclosed in square brackets means zero or
|
|
one occurrences. In other words, the enclosed phrase is optional.
|
|
* ``e?``: A question mark has exactly the same meaning as square brackets:
|
|
the preceding item is optional.
|
|
* ``(e)``: Parentheses are used for grouping.
|
|
* ``"a"..."z"``: Two literal characters separated by three dots mean a choice
|
|
of any single character in the given (inclusive) range of ASCII characters.
|
|
This notation is only used in
|
|
:ref:`lexical definitions <notation-lexical-vs-syntactic>`.
|
|
* ``<...>``: A phrase between angular brackets gives an informal description
|
|
of the matched symbol (for example, ``<any ASCII character except "\">``),
|
|
or an abbreviation that is defined in nearby text (for example, ``<Lu>``).
|
|
This notation is only used in
|
|
:ref:`lexical definitions <notation-lexical-vs-syntactic>`.
|
|
|
|
The unary operators (``*``, ``+``, ``?``) bind as tightly as possible;
|
|
the vertical bar (``|``) binds most loosely.
|
|
|
|
White space is only meaningful to separate tokens.
|
|
|
|
Rules are normally contained on a single line, but rules that are too long
|
|
may be wrapped:
|
|
|
|
.. grammar-snippet::
|
|
:group: notation
|
|
|
|
literal: stringliteral | bytesliteral
|
|
| integer | floatnumber | imagnumber
|
|
|
|
Alternatively, rules may be formatted with the first line ending at the colon,
|
|
and each alternative beginning with a vertical bar on a new line.
|
|
For example:
|
|
|
|
|
|
.. grammar-snippet::
|
|
:group: notation-alt
|
|
|
|
literal:
|
|
| stringliteral
|
|
| bytesliteral
|
|
| integer
|
|
| floatnumber
|
|
| imagnumber
|
|
|
|
This does *not* mean that there is an empty first alternative.
|
|
|
|
.. index:: lexical definitions
|
|
|
|
.. _notation-lexical-vs-syntactic:
|
|
|
|
Lexical and Syntactic definitions
|
|
---------------------------------
|
|
|
|
There is some difference between *lexical* and *syntactic* analysis:
|
|
the :term:`lexical analyzer` operates on the individual characters of the
|
|
input source, while the *parser* (syntactic analyzer) operates on the stream
|
|
of :term:`tokens <token>` generated by the lexical analysis.
|
|
However, in some cases the exact boundary between the two phases is a
|
|
CPython implementation detail.
|
|
|
|
The practical difference between the two is that in *lexical* definitions,
|
|
all whitespace is significant.
|
|
The lexical analyzer :ref:`discards <whitespace>` all whitespace that is not
|
|
converted to tokens like :data:`token.INDENT` or :data:`~token.NEWLINE`.
|
|
*Syntactic* definitions then use these tokens, rather than source characters.
|
|
|
|
This documentation uses the same BNF grammar for both styles of definitions.
|
|
All uses of BNF in the next chapter (:ref:`lexical`) are lexical definitions;
|
|
uses in subsequent chapters are syntactic definitions.
|