Tutorial update for 3.0 by Paul Dubois.

I had to fix a few markup issues in controlflow.rst and modules.rst.

There's a unicode issue on line 448 in introduction.rst that someone else needs to fix.
This commit is contained in:
Guido van Rossum 2007-08-31 03:25:11 +00:00
parent 8b2af27dae
commit 0616b792ba
12 changed files with 379 additions and 353 deletions

View file

@ -59,11 +59,30 @@ operators ``+``, ``-``, ``*`` and ``/`` work just like in most other languages
>>> 2+2 # and a comment on the same line as code
4
>>> (50-5*6)/4
5
5.0
>>> 8/5 # Fractions aren't lost when dividing integers
1.6000000000000001
Note: You might not see exactly the same result; floating point results can
differ from one machine to another. We will say more later about controlling
the appearance of floating point output; what we see here is the most
informative display but not as easy to read as we would get with::
>>> print(8/5)
1.6
For clarity in this tutorial we will show the simpler floating point output
unless we are specifically discussing output formatting, and explain later
why these two ways of displaying floating point data come to be different.
See :ref:`tut-fp-issues` for a full discussion.
To do integer division and get an integer result,
discarding any fractional result, there is another operator, ``//``::
>>> # Integer division returns the floor:
... 7/3
... 7//3
2
>>> 7/-3
>>> 7//-3
-3
The equal sign (``'='``) is used to assign a value to a variable. Afterwards, no
@ -176,6 +195,13 @@ several ways. They can be enclosed in single quotes or double quotes::
>>> '"Isn\'t," she said.'
'"Isn\'t," she said.'
The interpreter prints the result of string operations in the same way as they
are typed for input: inside quotes, and with quotes and other funny characters
escaped by backslashes, to show the precise value. The string is enclosed in
double quotes if the string contains a single quote and no double quotes, else
it's enclosed in single quotes. Once again, the :func:`print` function
produces the more readable output.
String literals can span multiple lines in several ways. Continuation lines can
be used, with a backslash as the last character on the line indicating that the
next line is a logical continuation of the line::
@ -185,7 +211,7 @@ next line is a logical continuation of the line::
Note that whitespace at the beginning of the line is\
significant."
print hello
print(hello)
Note that newlines still need to be embedded in the string using ``\n``; the
newline following the trailing backslash is discarded. This example would print
@ -203,7 +229,7 @@ the example::
hello = r"This is a rather long string containing\n\
several lines of text much as you would do in C."
print hello
print(hello)
would print::
@ -214,11 +240,11 @@ Or, strings can be surrounded in a pair of matching triple-quotes: ``"""`` or
``'''``. End of lines do not need to be escaped when using triple-quotes, but
they will be included in the string. ::
print """
print("""
Usage: thingy [OPTIONS]
-h Display this usage message
-H hostname Hostname to connect to
"""
""")
produces the following output::
@ -226,12 +252,6 @@ produces the following output::
-h Display this usage message
-H hostname Hostname to connect to
The interpreter prints the result of string operations in the same way as they
are typed for input: inside quotes, and with quotes and other funny characters
escaped by backslashes, to show the precise value. The string is enclosed in
double quotes if the string contains a single quote and no double quotes, else
it's enclosed in single quotes. (The :keyword:`print` statement, described
later, can be used to write strings without quotes or escapes.)
Strings can be concatenated (glued together) with the ``+`` operator, and
repeated with ``*``::
@ -258,7 +278,7 @@ with two literals, not with arbitrary string expressions::
Strings can be subscripted (indexed); like in C, the first character of a string
has subscript (index) 0. There is no separate character type; a character is
simply a string of size one. Like in Icon, substrings can be specified with the
simply a string of size one. As in Icon, substrings can be specified with the
*slice notation*: two indices separated by a colon. ::
>>> word[4]
@ -282,11 +302,11 @@ position in the string results in an error::
>>> word[0] = 'x'
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: object doesn't support item assignment
TypeError: 'str' object doesn't support item assignment
>>> word[:1] = 'Splat'
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: object doesn't support slice assignment
TypeError: 'str' object doesn't support slice assignment
However, creating a new string with the combined content is easy and efficient::
@ -371,31 +391,28 @@ The built-in function :func:`len` returns the length of a string::
.. seealso::
:ref:`typesseq`
Strings, and the Unicode strings described in the next section, are
examples of *sequence types*, and support the common operations supported
by such types.
Strings are examples of *sequence types*, and support the common
operations supported by such types.
:ref:`string-methods`
Both strings and Unicode strings support a large number of methods for
Strings support a large number of methods for
basic transformations and searching.
:ref:`string-formatting`
The formatting operations invoked when strings and Unicode strings are the
The formatting operations invoked when strings are the
left operand of the ``%`` operator are described in more detail here.
.. _tut-unicodestrings:
Unicode Strings
---------------
About Unicode
-------------
.. sectionauthor:: Marc-Andre Lemburg <mal@lemburg.com>
Starting with Python 2.0 a new data type for storing text data is available to
the programmer: the Unicode object. It can be used to store and manipulate
Unicode data (see http://www.unicode.org/) and integrates well with the existing
string objects, providing auto-conversions where necessary.
Starting with Python 3.0 all strings support Unicode.
(See http://www.unicode.org/)
Unicode has the advantage of providing one ordinal for every character in every
script used in modern and ancient texts. Previously, there were only 256
@ -405,19 +422,12 @@ confusion especially with respect to internationalization (usually written as
``i18n`` --- ``'i'`` + 18 characters + ``'n'``) of software. Unicode solves
these problems by defining one code page for all scripts.
Creating Unicode strings in Python is just as simple as creating normal
strings::
>>> u'Hello World !'
u'Hello World !'
The small ``'u'`` in front of the quote indicates that a Unicode string is
supposed to be created. If you want to include special characters in the string,
If you want to include special characters in a string,
you can do so by using the Python *Unicode-Escape* encoding. The following
example shows how::
>>> u'Hello\u0020World !'
u'Hello World !'
>>> 'Hello\u0020World !'
'Hello World !'
The escape sequence ``\u0020`` indicates to insert the Unicode character with
the ordinal value 0x0020 (the space character) at the given position.
@ -428,59 +438,17 @@ Latin-1 encoding that is used in many Western countries, you will find it
convenient that the lower 256 characters of Unicode are the same as the 256
characters of Latin-1.
For experts, there is also a raw mode just like the one for normal strings. You
have to prefix the opening quote with 'ur' to have Python use the
*Raw-Unicode-Escape* encoding. It will only apply the above ``\uXXXX``
conversion if there is an uneven number of backslashes in front of the small
'u'. ::
>>> ur'Hello\u0020World !'
u'Hello World !'
>>> ur'Hello\\u0020World !'
u'Hello\\\\u0020World !'
The raw mode is most useful when you have to enter lots of backslashes, as can
be necessary in regular expressions.
Apart from these standard encodings, Python provides a whole set of other ways
of creating Unicode strings on the basis of a known encoding.
.. index:: builtin: unicode
The built-in function :func:`unicode` provides access to all registered Unicode
codecs (COders and DECoders). Some of the more well known encodings which these
codecs can convert are *Latin-1*, *ASCII*, *UTF-8*, and *UTF-16*. The latter two
are variable-length encodings that store each Unicode character in one or more
bytes. The default encoding is normally set to ASCII, which passes through
characters in the range 0 to 127 and rejects any other characters with an error.
When a Unicode string is printed, written to a file, or converted with
:func:`str`, conversion takes place using this default encoding. ::
>>> u"abc"
u'abc'
>>> str(u"abc")
'abc'
>>> u"äöü"
u'\xe4\xf6\xfc'
>>> str(u"äöü")
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
To convert a Unicode string into an 8-bit string using a specific encoding,
Unicode objects provide an :func:`encode` method that takes one argument, the
To convert a string into a sequence of bytes using a specific encoding,
string objects provide an :func:`encode` method that takes one argument, the
name of the encoding. Lowercase names for encodings are preferred. ::
>>> u"äöü".encode('utf-8')
'\xc3\xa4\xc3\xb6\xc3\xbc'
If you have data in a specific encoding and want to produce a corresponding
Unicode string from it, you can use the :func:`unicode` function with the
encoding name as the second argument. ::
>>> unicode('\xc3\xa4\xc3\xb6\xc3\xbc', 'utf-8')
u'\xe4\xf6\xfc'
>>> "äÃ\u0020Ã".encode('utf-8')
b'A*A A'
.. % above example needs beefing up by a unicode dude
.. _tut-lists:
@ -561,7 +529,10 @@ example::
[2, 3]
>>> p[1][0]
2
>>> p[1].append('xtra') # See section 5.1
You can add something to the end of the list::
>>> p[1].append('xtra')
>>> p
[1, [2, 3, 'xtra'], 4]
>>> q
@ -584,7 +555,7 @@ series as follows::
... # the sum of two elements defines the next
... a, b = 0, 1
>>> while b < 10:
... print b
... print(b)
... a, b = b, a+b
...
1
@ -620,26 +591,29 @@ This example introduces several new features.
completion (since the parser cannot guess when you have typed the last line).
Note that each line within a basic block must be indented by the same amount.
* The :keyword:`print` statement writes the value of the expression(s) it is
* The :func:`print` function writes the value of the expression(s) it is
given. It differs from just writing the expression you want to write (as we did
earlier in the calculator examples) in the way it handles multiple expressions
earlier in the calculator examples) in the way it handles multiple
expressions, floating point quantities,
and strings. Strings are printed without quotes, and a space is inserted
between items, so you can format things nicely, like this::
>>> i = 256*256
>>> print 'The value of i is', i
>>> print('The value of i is', i)
The value of i is 65536
A trailing comma avoids the newline after the output::
The keyword end can be used to avoid the newline after the output::
>>> a, b = 0, 1
>>> while b < 1000:
... print b,
... print(b, ' ', end='')
... a, b = b, a+b
...
>>> print()
1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
Note that the interpreter inserts a newline before it prints the next prompt if
the last line was not completed.
Note that nothing appeared after the loop ended, until we printed
a newline.