Patch #534304: Implement phase 1 of PEP 263.

This commit is contained in:
Martin v. Löwis 2002-08-04 17:29:52 +00:00
parent a729daf2e4
commit 00f1e3f5a5
13 changed files with 656 additions and 31 deletions

View file

@ -7,11 +7,14 @@ chapter describes how the lexical analyzer breaks a file into tokens.
\index{parser}
\index{token}
Python uses the 7-bit \ASCII{} character set for program text and string
literals. 8-bit characters may be used in string literals and comments
but their interpretation is platform dependent; the proper way to
insert 8-bit characters in string literals is by using octal or
hexadecimal escape sequences.
Python uses the 7-bit \ASCII{} character set for program text.
\versionadded[An encoding declaration can be used to indicate that
string literals and comments use an encoding different from ASCII.]{2.3}
For compatibility with older versions, Python only warns if it finds
8-bit characters; those warnings should be corrected by either declaring
an explicit encoding, or using escape sequences if those bytes are binary
data, instead of characters.
The run-time character set depends on the I/O devices connected to the
program but is generally a superset of \ASCII.
@ -69,6 +72,37 @@ Comments are ignored by the syntax; they are not tokens.
\index{hash character}
\subsection{Encoding declarations\label{encodings}}
If a comment in the first or second line of the Python script matches
the regular expression "coding[=:]\s*([\w-_.]+)", this comment is
processed as an encoding declaration; the first group of this
expression names the encoding of the source code file. The recommended
forms of this expression are
\begin{verbatim}
# -*- coding: <encoding-name> -*-
\end{verbatim}
which is recognized also by GNU Emacs, and
\begin{verbatim}
# vim:fileencoding=<encoding-name>
\end{verbatim}
which is recognized by Bram Moolenar's VIM. In addition, if the first
bytes of the file are the UTF-8 signature ($'\xef\xbb\xbf'$), the
declared file encoding is UTF-8 (this is supported, among others, by
Microsoft's notepad.exe).
If an encoding is declared, the encoding name must be recognized by
Python. % XXX there should be a list of supported encodings.
The encoding is used for all lexical analysis, in particular to find
the end of a string, and to interpret the contents of Unicode literals.
String literals are converted to Unicode for syntactical analysis,
then converted back to their original encoding before interpretation
starts.
\subsection{Explicit line joining\label{explicit-joining}}
Two or more physical lines may be joined into logical lines using