Move the 3k reST doc tree in place.

2025-07-24 19:54:21 +00:00 · 2007-08-15 14:28:22 +00:00 · 2007-08-15 14:28:22 +00:00 · 116aa62bf5
commit 116aa62bf5
parent 739c01d47b
423 changed files with 131199 additions and 0 deletions
--- a/Doc/howto/advocacy.rst
+++ b/Doc/howto/advocacy.rst
@ -0,0 +1,356 @@
+*************************
+  Python Advocacy HOWTO
+*************************
+
+:Author: A.M. Kuchling
+:Release: 0.03
+
+
+.. topic:: Abstract
+
+   It's usually difficult to get your management to accept open source software,
+   and Python is no exception to this rule.  This document discusses reasons to use
+   Python, strategies for winning acceptance, facts and arguments you can use, and
+   cases where you *shouldn't* try to use Python.
+
+
+Reasons to Use Python
+=====================
+
+There are several reasons to incorporate a scripting language into your
+development process, and this section will discuss them, and why Python has some
+properties that make it a particularly good choice.
+
+
+Programmability
+---------------
+
+Programs are often organized in a modular fashion.  Lower-level operations are
+grouped together, and called by higher-level functions, which may in turn be
+used as basic operations by still further upper levels.
+
+For example, the lowest level might define a very low-level set of functions for
+accessing a hash table.  The next level might use hash tables to store the
+headers of a mail message, mapping a header name like ``Date`` to a value such
+as ``Tue, 13 May 1997 20:00:54 -0400``.  A yet higher level may operate on
+message objects, without knowing or caring that message headers are stored in a
+hash table, and so forth.
+
+Often, the lowest levels do very simple things; they implement a data structure
+such as a binary tree or hash table, or they perform some simple computation,
+such as converting a date string to a number.  The higher levels then contain
+logic connecting these primitive operations.  Using the approach, the primitives
+can be seen as basic building blocks which are then glued together to produce
+the complete product.
+
+Why is this design approach relevant to Python?  Because Python is well suited
+to functioning as such a glue language.  A common approach is to write a Python
+module that implements the lower level operations; for the sake of speed, the
+implementation might be in C, Java, or even Fortran.  Once the primitives are
+available to Python programs, the logic underlying higher level operations is
+written in the form of Python code.  The high-level logic is then more
+understandable, and easier to modify.
+
+John Ousterhout wrote a paper that explains this idea at greater length,
+entitled "Scripting: Higher Level Programming for the 21st Century".  I
+recommend that you read this paper; see the references for the URL.  Ousterhout
+is the inventor of the Tcl language, and therefore argues that Tcl should be
+used for this purpose; he only briefly refers to other languages such as Python,
+Perl, and Lisp/Scheme, but in reality, Ousterhout's argument applies to
+scripting languages in general, since you could equally write extensions for any
+of the languages mentioned above.
+
+
+Prototyping
+-----------
+
+In *The Mythical Man-Month*, Fredrick Brooks suggests the following rule when
+planning software projects: "Plan to throw one away; you will anyway."  Brooks
+is saying that the first attempt at a software design often turns out to be
+wrong; unless the problem is very simple or you're an extremely good designer,
+you'll find that new requirements and features become apparent once development
+has actually started.  If these new requirements can't be cleanly incorporated
+into the program's structure, you're presented with two unpleasant choices:
+hammer the new features into the program somehow, or scrap everything and write
+a new version of the program, taking the new features into account from the
+beginning.
+
+Python provides you with a good environment for quickly developing an initial
+prototype.  That lets you get the overall program structure and logic right, and
+you can fine-tune small details in the fast development cycle that Python
+provides.  Once you're satisfied with the GUI interface or program output, you
+can translate the Python code into C++, Fortran, Java, or some other compiled
+language.
+
+Prototyping means you have to be careful not to use too many Python features
+that are hard to implement in your other language.  Using ``eval()``, or regular
+expressions, or the :mod:`pickle` module, means that you're going to need C or
+Java libraries for formula evaluation, regular expressions, and serialization,
+for example.  But it's not hard to avoid such tricky code, and in the end the
+translation usually isn't very difficult.  The resulting code can be rapidly
+debugged, because any serious logical errors will have been removed from the
+prototype, leaving only more minor slip-ups in the translation to track down.
+
+This strategy builds on the earlier discussion of programmability. Using Python
+as glue to connect lower-level components has obvious relevance for constructing
+prototype systems.  In this way Python can help you with development, even if
+end users never come in contact with Python code at all.  If the performance of
+the Python version is adequate and corporate politics allow it, you may not need
+to do a translation into C or Java, but it can still be faster to develop a
+prototype and then translate it, instead of attempting to produce the final
+version immediately.
+
+One example of this development strategy is Microsoft Merchant Server. Version
+1.0 was written in pure Python, by a company that subsequently was purchased by
+Microsoft.  Version 2.0 began to translate the code into C++, shipping with some
+C++code and some Python code.  Version 3.0 didn't contain any Python at all; all
+the code had been translated into C++.  Even though the product doesn't contain
+a Python interpreter, the Python language has still served a useful purpose by
+speeding up development.
+
+This is a very common use for Python.  Past conference papers have also
+described this approach for developing high-level numerical algorithms; see
+David M. Beazley and Peter S. Lomdahl's paper "Feeding a Large-scale Physics
+Application to Python" in the references for a good example.  If an algorithm's
+basic operations are things like "Take the inverse of this 4000x4000 matrix",
+and are implemented in some lower-level language, then Python has almost no
+additional performance cost; the extra time required for Python to evaluate an
+expression like ``m.invert()`` is dwarfed by the cost of the actual computation.
+It's particularly good for applications where seemingly endless tweaking is
+required to get things right. GUI interfaces and Web sites are prime examples.
+
+The Python code is also shorter and faster to write (once you're familiar with
+Python), so it's easier to throw it away if you decide your approach was wrong;
+if you'd spent two weeks working on it instead of just two hours, you might
+waste time trying to patch up what you've got out of a natural reluctance to
+admit that those two weeks were wasted.  Truthfully, those two weeks haven't
+been wasted, since you've learnt something about the problem and the technology
+you're using to solve it, but it's human nature to view this as a failure of
+some sort.
+
+
+Simplicity and Ease of Understanding
+------------------------------------
+
+Python is definitely *not* a toy language that's only usable for small tasks.
+The language features are general and powerful enough to enable it to be used
+for many different purposes.  It's useful at the small end, for 10- or 20-line
+scripts, but it also scales up to larger systems that contain thousands of lines
+of code.
+
+However, this expressiveness doesn't come at the cost of an obscure or tricky
+syntax.  While Python has some dark corners that can lead to obscure code, there
+are relatively few such corners, and proper design can isolate their use to only
+a few classes or modules.  It's certainly possible to write confusing code by
+using too many features with too little concern for clarity, but most Python
+code can look a lot like a slightly-formalized version of human-understandable
+pseudocode.
+
+In *The New Hacker's Dictionary*, Eric S. Raymond gives the following definition
+for "compact":
+
+.. epigraph::
+
+   Compact *adj.*  Of a design, describes the valuable property that it can all be
+   apprehended at once in one's head. This generally means the thing created from
+   the design can be used with greater facility and fewer errors than an equivalent
+   tool that is not compact. Compactness does not imply triviality or lack of
+   power; for example, C is compact and FORTRAN is not, but C is more powerful than
+   FORTRAN. Designs become non-compact through accreting features and cruft that
+   don't merge cleanly into the overall design scheme (thus, some fans of Classic C
+   maintain that ANSI C is no longer compact).
+
+   (From http://www.catb.org/ esr/jargon/html/C/compact.html)
+
+In this sense of the word, Python is quite compact, because the language has
+just a few ideas, which are used in lots of places.  Take namespaces, for
+example.  Import a module with ``import math``, and you create a new namespace
+called ``math``.  Classes are also namespaces that share many of the properties
+of modules, and have a few of their own; for example, you can create instances
+of a class. Instances?  They're yet another namespace.  Namespaces are currently
+implemented as Python dictionaries, so they have the same methods as the
+standard dictionary data type: .keys() returns all the keys, and so forth.
+
+This simplicity arises from Python's development history.  The language syntax
+derives from different sources; ABC, a relatively obscure teaching language, is
+one primary influence, and Modula-3 is another.  (For more information about ABC
+and Modula-3, consult their respective Web sites at http://www.cwi.nl/
+steven/abc/ and http://www.m3.org.)  Other features have come from C, Icon,
+Algol-68, and even Perl.  Python hasn't really innovated very much, but instead
+has tried to keep the language small and easy to learn, building on ideas that
+have been tried in other languages and found useful.
+
+Simplicity is a virtue that should not be underestimated.  It lets you learn the
+language more quickly, and then rapidly write code, code that often works the
+first time you run it.
+
+
+Java Integration
+----------------
+
+If you're working with Java, Jython (http://www.jython.org/) is definitely worth
+your attention.  Jython is a re-implementation of Python in Java that compiles
+Python code into Java bytecodes.  The resulting environment has very tight,
+almost seamless, integration with Java.  It's trivial to access Java classes
+from Python, and you can write Python classes that subclass Java classes.
+Jython can be used for prototyping Java applications in much the same way
+CPython is used, and it can also be used for test suites for Java code, or
+embedded in a Java application to add scripting capabilities.
+
+
+Arguments and Rebuttals
+=======================
+
+Let's say that you've decided upon Python as the best choice for your
+application.  How can you convince your management, or your fellow developers,
+to use Python?  This section lists some common arguments against using Python,
+and provides some possible rebuttals.
+
+**Python is freely available software that doesn't cost anything. How good can
+it be?**
+
+Very good, indeed.  These days Linux and Apache, two other pieces of open source
+software, are becoming more respected as alternatives to commercial software,
+but Python hasn't had all the publicity.
+
+Python has been around for several years, with many users and developers.
+Accordingly, the interpreter has been used by many people, and has gotten most
+of the bugs shaken out of it.  While bugs are still discovered at intervals,
+they're usually either quite obscure (they'd have to be, for no one to have run
+into them before) or they involve interfaces to external libraries.  The
+internals of the language itself are quite stable.
+
+Having the source code should be viewed as making the software available for
+peer review; people can examine the code, suggest (and implement) improvements,
+and track down bugs.  To find out more about the idea of open source code, along
+with arguments and case studies supporting it, go to http://www.opensource.org.
+
+**Who's going to support it?**
+
+Python has a sizable community of developers, and the number is still growing.
+The Internet community surrounding the language is an active one, and is worth
+being considered another one of Python's advantages. Most questions posted to
+the comp.lang.python newsgroup are quickly answered by someone.
+
+Should you need to dig into the source code, you'll find it's clear and
+well-organized, so it's not very difficult to write extensions and track down
+bugs yourself.  If you'd prefer to pay for support, there are companies and
+individuals who offer commercial support for Python.
+
+**Who uses Python for serious work?**
+
+Lots of people; one interesting thing about Python is the surprising diversity
+of applications that it's been used for.  People are using Python to:
+
+* Run Web sites
+
+* Write GUI interfaces
+
+* Control number-crunching code on supercomputers
+
+* Make a commercial application scriptable by embedding the Python interpreter
+  inside it
+
+* Process large XML data sets
+
+* Build test suites for C or Java code
+
+Whatever your application domain is, there's probably someone who's used Python
+for something similar.  Yet, despite being useable for such high-end
+applications, Python's still simple enough to use for little jobs.
+
+See http://wiki.python.org/moin/OrganizationsUsingPython for a list of some of
+the  organizations that use Python.
+
+**What are the restrictions on Python's use?**
+
+They're practically nonexistent.  Consult the :file:`Misc/COPYRIGHT` file in the
+source distribution, or http://www.python.org/doc/Copyright.html for the full
+language, but it boils down to three conditions.
+
+* You have to leave the copyright notice on the software; if you don't include
+  the source code in a product, you have to put the copyright notice in the
+  supporting documentation.
+
+* Don't claim that the institutions that have developed Python endorse your
+  product in any way.
+
+* If something goes wrong, you can't sue for damages.  Practically all software
+  licences contain this condition.
+
+Notice that you don't have to provide source code for anything that contains
+Python or is built with it.  Also, the Python interpreter and accompanying
+documentation can be modified and redistributed in any way you like, and you
+don't have to pay anyone any licensing fees at all.
+
+**Why should we use an obscure language like Python instead of well-known
+language X?**
+
+I hope this HOWTO, and the documents listed in the final section, will help
+convince you that Python isn't obscure, and has a healthily growing user base.
+One word of advice: always present Python's positive advantages, instead of
+concentrating on language X's failings.  People want to know why a solution is
+good, rather than why all the other solutions are bad.  So instead of attacking
+a competing solution on various grounds, simply show how Python's virtues can
+help.
+
+
+Useful Resources
+================
+
+http://www.pythonology.com/success
+   The Python Success Stories are a collection of stories from successful users of
+   Python, with the emphasis on business and corporate users.
+
+.. % \term{\url{http://www.fsbassociates.com/books/pythonchpt1.htm}}
+.. % The first chapter of \emph{Internet Programming with Python} also
+.. % examines some of the reasons for using Python.  The book is well worth
+.. % buying, but the publishers have made the first chapter available on
+.. % the Web.
+
+http://home.pacbell.net/ouster/scripting.html
+   John Ousterhout's white paper on scripting is a good argument for the utility of
+   scripting languages, though naturally enough, he emphasizes Tcl, the language he
+   developed.  Most of the arguments would apply to any scripting language.
+
+http://www.python.org/workshops/1997-10/proceedings/beazley.html
+   The authors, David M. Beazley and Peter S. Lomdahl,  describe their use of
+   Python at Los Alamos National Laboratory. It's another good example of how
+   Python can help get real work done. This quotation from the paper has been
+   echoed by many people:
+
+   .. epigraph::
+
+      Originally developed as a large monolithic application for massively parallel
+      processing systems, we have used Python to transform our application into a
+      flexible, highly modular, and extremely powerful system for performing
+      simulation, data analysis, and visualization. In addition, we describe how
+      Python has solved a number of important problems related to the development,
+      debugging, deployment, and maintenance of scientific software.
+
+http://pythonjournal.cognizor.com/pyj1/Everitt-Feit_interview98-V1.html
+   This interview with Andy Feit, discussing Infoseek's use of Python, can be used
+   to show that choosing Python didn't introduce any difficulties into a company's
+   development process, and provided some substantial benefits.
+
+.. % \term{\url{http://www.python.org/psa/Commercial.html}}
+.. % Robin Friedrich wrote this document on how to support Python's use in
+.. % commercial projects.
+
+http://www.python.org/workshops/1997-10/proceedings/stein.ps
+   For the 6th Python conference, Greg Stein presented a paper that traced Python's
+   adoption and usage at a startup called eShop, and later at Microsoft.
+
+http://www.opensource.org
+   Management may be doubtful of the reliability and usefulness of software that
+   wasn't written commercially.  This site presents arguments that show how open
+   source software can have considerable advantages over closed-source software.
+
+http://sunsite.unc.edu/LDP/HOWTO/mini/Advocacy.html
+   The Linux Advocacy mini-HOWTO was the inspiration for this document, and is also
+   well worth reading for general suggestions on winning acceptance for a new
+   technology, such as Linux or Python.  In general, you won't make much progress
+   by simply attacking existing systems and complaining about their inadequacies;
+   this often ends up looking like unfocused whining.  It's much better to point
+   out some of the many areas where Python is an improvement over other systems.
+
--- a/Doc/howto/curses.rst
+++ b/Doc/howto/curses.rst
@ -0,0 +1,434 @@
+**********************************
+  Curses Programming with Python
+**********************************
+
+:Author: A.M. Kuchling, Eric S. Raymond
+:Release: 2.02
+
+
+.. topic:: Abstract
+
+   This document describes how to write text-mode programs with Python 2.x, using
+   the :mod:`curses` extension module to control the display.
+
+
+What is curses?
+===============
+
+The curses library supplies a terminal-independent screen-painting and
+keyboard-handling facility for text-based terminals; such terminals include
+VT100s, the Linux console, and the simulated terminal provided by X11 programs
+such as xterm and rxvt.  Display terminals support various control codes to
+perform common operations such as moving the cursor, scrolling the screen, and
+erasing areas.  Different terminals use widely differing codes, and often have
+their own minor quirks.
+
+In a world of X displays, one might ask "why bother"?  It's true that
+character-cell display terminals are an obsolete technology, but there are
+niches in which being able to do fancy things with them are still valuable.  One
+is on small-footprint or embedded Unixes that don't carry an X server.  Another
+is for tools like OS installers and kernel configurators that may have to run
+before X is available.
+
+The curses library hides all the details of different terminals, and provides
+the programmer with an abstraction of a display, containing multiple
+non-overlapping windows.  The contents of a window can be changed in various
+ways-- adding text, erasing it, changing its appearance--and the curses library
+will automagically figure out what control codes need to be sent to the terminal
+to produce the right output.
+
+The curses library was originally written for BSD Unix; the later System V
+versions of Unix from AT&T added many enhancements and new functions. BSD curses
+is no longer maintained, having been replaced by ncurses, which is an
+open-source implementation of the AT&T interface.  If you're using an
+open-source Unix such as Linux or FreeBSD, your system almost certainly uses
+ncurses.  Since most current commercial Unix versions are based on System V
+code, all the functions described here will probably be available.  The older
+versions of curses carried by some proprietary Unixes may not support
+everything, though.
+
+No one has made a Windows port of the curses module.  On a Windows platform, try
+the Console module written by Fredrik Lundh.  The Console module provides
+cursor-addressable text output, plus full support for mouse and keyboard input,
+and is available from http://effbot.org/efflib/console.
+
+
+The Python curses module
+------------------------
+
+Thy Python module is a fairly simple wrapper over the C functions provided by
+curses; if you're already familiar with curses programming in C, it's really
+easy to transfer that knowledge to Python.  The biggest difference is that the
+Python interface makes things simpler, by merging different C functions such as
+:func:`addstr`, :func:`mvaddstr`, :func:`mvwaddstr`, into a single
+:meth:`addstr` method.  You'll see this covered in more detail later.
+
+This HOWTO is simply an introduction to writing text-mode programs with curses
+and Python. It doesn't attempt to be a complete guide to the curses API; for
+that, see the Python library guide's section on ncurses, and the C manual pages
+for ncurses.  It will, however, give you the basic ideas.
+
+
+Starting and ending a curses application
+========================================
+
+Before doing anything, curses must be initialized.  This is done by calling the
+:func:`initscr` function, which will determine the terminal type, send any
+required setup codes to the terminal, and create various internal data
+structures.  If successful, :func:`initscr` returns a window object representing
+the entire screen; this is usually called ``stdscr``, after the name of the
+corresponding C variable. ::
+
+   import curses
+   stdscr = curses.initscr()
+
+Usually curses applications turn off automatic echoing of keys to the screen, in
+order to be able to read keys and only display them under certain circumstances.
+This requires calling the :func:`noecho` function. ::
+
+   curses.noecho()
+
+Applications will also commonly need to react to keys instantly, without
+requiring the Enter key to be pressed; this is called cbreak mode, as opposed to
+the usual buffered input mode. ::
+
+   curses.cbreak()
+
+Terminals usually return special keys, such as the cursor keys or navigation
+keys such as Page Up and Home, as a multibyte escape sequence.  While you could
+write your application to expect such sequences and process them accordingly,
+curses can do it for you, returning a special value such as
+:const:`curses.KEY_LEFT`.  To get curses to do the job, you'll have to enable
+keypad mode. ::
+
+   stdscr.keypad(1)
+
+Terminating a curses application is much easier than starting one. You'll need
+to call  ::
+
+   curses.nocbreak(); stdscr.keypad(0); curses.echo()
+
+to reverse the curses-friendly terminal settings. Then call the :func:`endwin`
+function to restore the terminal to its original operating mode. ::
+
+   curses.endwin()
+
+A common problem when debugging a curses application is to get your terminal
+messed up when the application dies without restoring the terminal to its
+previous state.  In Python this commonly happens when your code is buggy and
+raises an uncaught exception.  Keys are no longer be echoed to the screen when
+you type them, for example, which makes using the shell difficult.
+
+In Python you can avoid these complications and make debugging much easier by
+importing the module :mod:`curses.wrapper`.  It supplies a :func:`wrapper`
+function that takes a callable.  It does the initializations described above,
+and also initializes colors if color support is present.  It then runs your
+provided callable and finally deinitializes appropriately.  The callable is
+called inside a try-catch clause which catches exceptions, performs curses
+deinitialization, and then passes the exception upwards.  Thus, your terminal
+won't be left in a funny state on exception.
+
+
+Windows and Pads
+================
+
+Windows are the basic abstraction in curses.  A window object represents a
+rectangular area of the screen, and supports various methods to display text,
+erase it, allow the user to input strings, and so forth.
+
+The ``stdscr`` object returned by the :func:`initscr` function is a window
+object that covers the entire screen.  Many programs may need only this single
+window, but you might wish to divide the screen into smaller windows, in order
+to redraw or clear them separately. The :func:`newwin` function creates a new
+window of a given size, returning the new window object. ::
+
+   begin_x = 20 ; begin_y = 7
+   height = 5 ; width = 40
+   win = curses.newwin(height, width, begin_y, begin_x)
+
+A word about the coordinate system used in curses: coordinates are always passed
+in the order *y,x*, and the top-left corner of a window is coordinate (0,0).
+This breaks a common convention for handling coordinates, where the *x*
+coordinate usually comes first.  This is an unfortunate difference from most
+other computer applications, but it's been part of curses since it was first
+written, and it's too late to change things now.
+
+When you call a method to display or erase text, the effect doesn't immediately
+show up on the display.  This is because curses was originally written with slow
+300-baud terminal connections in mind; with these terminals, minimizing the time
+required to redraw the screen is very important.  This lets curses accumulate
+changes to the screen, and display them in the most efficient manner.  For
+example, if your program displays some characters in a window, and then clears
+the window, there's no need to send the original characters because they'd never
+be visible.
+
+Accordingly, curses requires that you explicitly tell it to redraw windows,
+using the :func:`refresh` method of window objects.  In practice, this doesn't
+really complicate programming with curses much. Most programs go into a flurry
+of activity, and then pause waiting for a keypress or some other action on the
+part of the user.  All you have to do is to be sure that the screen has been
+redrawn before pausing to wait for user input, by simply calling
+``stdscr.refresh()`` or the :func:`refresh` method of some other relevant
+window.
+
+A pad is a special case of a window; it can be larger than the actual display
+screen, and only a portion of it displayed at a time. Creating a pad simply
+requires the pad's height and width, while refreshing a pad requires giving the
+coordinates of the on-screen area where a subsection of the pad will be
+displayed.   ::
+
+   pad = curses.newpad(100, 100)
+   #  These loops fill the pad with letters; this is
+   # explained in the next section
+   for y in range(0, 100):
+       for x in range(0, 100):
+           try: pad.addch(y,x, ord('a') + (x*x+y*y) % 26 )
+           except curses.error: pass
+
+   #  Displays a section of the pad in the middle of the screen
+   pad.refresh( 0,0, 5,5, 20,75)
+
+The :func:`refresh` call displays a section of the pad in the rectangle
+extending from coordinate (5,5) to coordinate (20,75) on the screen; the upper
+left corner of the displayed section is coordinate (0,0) on the pad.  Beyond
+that difference, pads are exactly like ordinary windows and support the same
+methods.
+
+If you have multiple windows and pads on screen there is a more efficient way to
+go, which will prevent annoying screen flicker at refresh time.  Use the
+:meth:`noutrefresh` method of each window to update the data structure
+representing the desired state of the screen; then change the physical screen to
+match the desired state in one go with the function :func:`doupdate`.  The
+normal :meth:`refresh` method calls :func:`doupdate` as its last act.
+
+
+Displaying Text
+===============
+
+From a C programmer's point of view, curses may sometimes look like a twisty
+maze of functions, all subtly different.  For example, :func:`addstr` displays a
+string at the current cursor location in the ``stdscr`` window, while
+:func:`mvaddstr` moves to a given y,x coordinate first before displaying the
+string. :func:`waddstr` is just like :func:`addstr`, but allows specifying a
+window to use, instead of using ``stdscr`` by default. :func:`mvwaddstr` follows
+similarly.
+
+Fortunately the Python interface hides all these details; ``stdscr`` is a window
+object like any other, and methods like :func:`addstr` accept multiple argument
+forms.  Usually there are four different forms.
+
+---------------------------------+-----------------------------------------------+
+| Form                            | Description                                   |
+=================================+===============================================+
+| *str* or *ch*                   | Display the string *str* or character *ch* at |
+|                                 | the current position                          |
+---------------------------------+-----------------------------------------------+
+| *str* or *ch*, *attr*           | Display the string *str* or character *ch*,   |
+|                                 | using attribute *attr* at the current         |
+|                                 | position                                      |
+---------------------------------+-----------------------------------------------+
+| *y*, *x*, *str* or *ch*         | Move to position *y,x* within the window, and |
+|                                 | display *str* or *ch*                         |
+---------------------------------+-----------------------------------------------+
+| *y*, *x*, *str* or *ch*, *attr* | Move to position *y,x* within the window, and |
+|                                 | display *str* or *ch*, using attribute *attr* |
+---------------------------------+-----------------------------------------------+
+
+Attributes allow displaying text in highlighted forms, such as in boldface,
+underline, reverse code, or in color.  They'll be explained in more detail in
+the next subsection.
+
+The :func:`addstr` function takes a Python string as the value to be displayed,
+while the :func:`addch` functions take a character, which can be either a Python
+string of length 1 or an integer.  If it's a string, you're limited to
+displaying characters between 0 and 255.  SVr4 curses provides constants for
+extension characters; these constants are integers greater than 255.  For
+example, :const:`ACS_PLMINUS` is a +/- symbol, and :const:`ACS_ULCORNER` is the
+upper left corner of a box (handy for drawing borders).
+
+Windows remember where the cursor was left after the last operation, so if you
+leave out the *y,x* coordinates, the string or character will be displayed
+wherever the last operation left off.  You can also move the cursor with the
+``move(y,x)`` method.  Because some terminals always display a flashing cursor,
+you may want to ensure that the cursor is positioned in some location where it
+won't be distracting; it can be confusing to have the cursor blinking at some
+apparently random location.
+
+If your application doesn't need a blinking cursor at all, you can call
+``curs_set(0)`` to make it invisible.  Equivalently, and for compatibility with
+older curses versions, there's a ``leaveok(bool)`` function.  When *bool* is
+true, the curses library will attempt to suppress the flashing cursor, and you
+won't need to worry about leaving it in odd locations.
+
+
+Attributes and Color
+--------------------
+
+Characters can be displayed in different ways.  Status lines in a text-based
+application are commonly shown in reverse video; a text viewer may need to
+highlight certain words.  curses supports this by allowing you to specify an
+attribute for each cell on the screen.
+
+An attribute is a integer, each bit representing a different attribute.  You can
+try to display text with multiple attribute bits set, but curses doesn't
+guarantee that all the possible combinations are available, or that they're all
+visually distinct.  That depends on the ability of the terminal being used, so
+it's safest to stick to the most commonly available attributes, listed here.
+
+----------------------+--------------------------------------+
+| Attribute            | Description                          |
+======================+======================================+
+| :const:`A_BLINK`     | Blinking text                        |
+----------------------+--------------------------------------+
+| :const:`A_BOLD`      | Extra bright or bold text            |
+----------------------+--------------------------------------+
+| :const:`A_DIM`       | Half bright text                     |
+----------------------+--------------------------------------+
+| :const:`A_REVERSE`   | Reverse-video text                   |
+----------------------+--------------------------------------+
+| :const:`A_STANDOUT`  | The best highlighting mode available |
+----------------------+--------------------------------------+
+| :const:`A_UNDERLINE` | Underlined text                      |
+----------------------+--------------------------------------+
+
+So, to display a reverse-video status line on the top line of the screen, you
+could code::
+
+   stdscr.addstr(0, 0, "Current mode: Typing mode",
+   	      curses.A_REVERSE)
+   stdscr.refresh()
+
+The curses library also supports color on those terminals that provide it, The
+most common such terminal is probably the Linux console, followed by color
+xterms.
+
+To use color, you must call the :func:`start_color` function soon after calling
+:func:`initscr`, to initialize the default color set (the
+:func:`curses.wrapper.wrapper` function does this automatically).  Once that's
+done, the :func:`has_colors` function returns TRUE if the terminal in use can
+actually display color.  (Note: curses uses the American spelling 'color',
+instead of the Canadian/British spelling 'colour'.  If you're used to the
+British spelling, you'll have to resign yourself to misspelling it for the sake
+of these functions.)
+
+The curses library maintains a finite number of color pairs, containing a
+foreground (or text) color and a background color.  You can get the attribute
+value corresponding to a color pair with the :func:`color_pair` function; this
+can be bitwise-OR'ed with other attributes such as :const:`A_REVERSE`, but
+again, such combinations are not guaranteed to work on all terminals.
+
+An example, which displays a line of text using color pair 1::
+
+   stdscr.addstr( "Pretty text", curses.color_pair(1) )
+   stdscr.refresh()
+
+As I said before, a color pair consists of a foreground and background color.
+:func:`start_color` initializes 8 basic colors when it activates color mode.
+They are: 0:black, 1:red, 2:green, 3:yellow, 4:blue, 5:magenta, 6:cyan, and
+7:white.  The curses module defines named constants for each of these colors:
+:const:`curses.COLOR_BLACK`, :const:`curses.COLOR_RED`, and so forth.
+
+The ``init_pair(n, f, b)`` function changes the definition of color pair *n*, to
+foreground color f and background color b.  Color pair 0 is hard-wired to white
+on black, and cannot be changed.
+
+Let's put all this together. To change color 1 to red text on a white
+background, you would call::
+
+   curses.init_pair(1, curses.COLOR_RED, curses.COLOR_WHITE)
+
+When you change a color pair, any text already displayed using that color pair
+will change to the new colors.  You can also display new text in this color
+with::
+
+   stdscr.addstr(0,0, "RED ALERT!", curses.color_pair(1) )
+
+Very fancy terminals can change the definitions of the actual colors to a given
+RGB value.  This lets you change color 1, which is usually red, to purple or
+blue or any other color you like.  Unfortunately, the Linux console doesn't
+support this, so I'm unable to try it out, and can't provide any examples.  You
+can check if your terminal can do this by calling :func:`can_change_color`,
+which returns TRUE if the capability is there.  If you're lucky enough to have
+such a talented terminal, consult your system's man pages for more information.
+
+
+User Input
+==========
+
+The curses library itself offers only very simple input mechanisms. Python's
+support adds a text-input widget that makes up some of the lack.
+
+The most common way to get input to a window is to use its :meth:`getch` method.
+:meth:`getch` pauses and waits for the user to hit a key, displaying it if
+:func:`echo` has been called earlier.  You can optionally specify a coordinate
+to which the cursor should be moved before pausing.
+
+It's possible to change this behavior with the method :meth:`nodelay`. After
+``nodelay(1)``, :meth:`getch` for the window becomes non-blocking and returns
+``curses.ERR`` (a value of -1) when no input is ready.  There's also a
+:func:`halfdelay` function, which can be used to (in effect) set a timer on each
+:meth:`getch`; if no input becomes available within the number of milliseconds
+specified as the argument to :func:`halfdelay`, curses raises an exception.
+
+The :meth:`getch` method returns an integer; if it's between 0 and 255, it
+represents the ASCII code of the key pressed.  Values greater than 255 are
+special keys such as Page Up, Home, or the cursor keys. You can compare the
+value returned to constants such as :const:`curses.KEY_PPAGE`,
+:const:`curses.KEY_HOME`, or :const:`curses.KEY_LEFT`.  Usually the main loop of
+your program will look something like this::
+
+   while 1:
+       c = stdscr.getch()
+       if c == ord('p'): PrintDocument()
+       elif c == ord('q'): break  # Exit the while()
+       elif c == curses.KEY_HOME: x = y = 0
+
+The :mod:`curses.ascii` module supplies ASCII class membership functions that
+take either integer or 1-character-string arguments; these may be useful in
+writing more readable tests for your command interpreters.  It also supplies
+conversion functions  that take either integer or 1-character-string arguments
+and return the same type.  For example, :func:`curses.ascii.ctrl` returns the
+control character corresponding to its argument.
+
+There's also a method to retrieve an entire string, :const:`getstr()`.  It isn't
+used very often, because its functionality is quite limited; the only editing
+keys available are the backspace key and the Enter key, which terminates the
+string.  It can optionally be limited to a fixed number of characters. ::
+
+   curses.echo()            # Enable echoing of characters
+
+   # Get a 15-character string, with the cursor on the top line 
+   s = stdscr.getstr(0,0, 15)  
+
+The Python :mod:`curses.textpad` module supplies something better. With it, you
+can turn a window into a text box that supports an Emacs-like set of
+keybindings.  Various methods of :class:`Textbox` class support editing with
+input validation and gathering the edit results either with or without trailing
+spaces.   See the library documentation on :mod:`curses.textpad` for the
+details.
+
+
+For More Information
+====================
+
+This HOWTO didn't cover some advanced topics, such as screen-scraping or
+capturing mouse events from an xterm instance.  But the Python library page for
+the curses modules is now pretty complete.  You should browse it next.
+
+If you're in doubt about the detailed behavior of any of the ncurses entry
+points, consult the manual pages for your curses implementation, whether it's
+ncurses or a proprietary Unix vendor's.  The manual pages will document any
+quirks, and provide complete lists of all the functions, attributes, and
+:const:`ACS_\*` characters available to you.
+
+Because the curses API is so large, some functions aren't supported in the
+Python interface, not because they're difficult to implement, but because no one
+has needed them yet.  Feel free to add them and then submit a patch.  Also, we
+don't yet have support for the menus or panels libraries associated with
+ncurses; feel free to add that.
+
+If you write an interesting little program, feel free to contribute it as
+another demo.  We can always use more of them!
+
+The ncurses FAQ: http://dickey.his.com/ncurses/ncurses.faq.html
+
--- a/Doc/howto/doanddont.rst
+++ b/Doc/howto/doanddont.rst
@ -0,0 +1,308 @@
+************************************
+  Idioms and Anti-Idioms in Python  
+************************************
+
+:Author: Moshe Zadka
+
+This document is placed in the public doman.
+
+
+.. topic:: Abstract
+
+   This document can be considered a companion to the tutorial. It shows how to use
+   Python, and even more importantly, how *not* to use Python.
+
+
+Language Constructs You Should Not Use
+======================================
+
+While Python has relatively few gotchas compared to other languages, it still
+has some constructs which are only useful in corner cases, or are plain
+dangerous.
+
+
+from module import \*
+---------------------
+
+
+Inside Function Definitions
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``from module import *`` is *invalid* inside function definitions. While many
+versions of Python do not check for the invalidity, it does not make it more
+valid, no more then having a smart lawyer makes a man innocent. Do not use it
+like that ever. Even in versions where it was accepted, it made the function
+execution slower, because the compiler could not be certain which names are
+local and which are global. In Python 2.1 this construct causes warnings, and
+sometimes even errors.
+
+
+At Module Level
+^^^^^^^^^^^^^^^
+
+While it is valid to use ``from module import *`` at module level it is usually
+a bad idea. For one, this loses an important property Python otherwise has ---
+you can know where each toplevel name is defined by a simple "search" function
+in your favourite editor. You also open yourself to trouble in the future, if
+some module grows additional functions or classes.
+
+One of the most awful question asked on the newsgroup is why this code::
+
+   f = open("www")
+   f.read()
+
+does not work. Of course, it works just fine (assuming you have a file called
+"www".) But it does not work if somewhere in the module, the statement ``from os
+import *`` is present. The :mod:`os` module has a function called :func:`open`
+which returns an integer. While it is very useful, shadowing builtins is one of
+its least useful properties.
+
+Remember, you can never know for sure what names a module exports, so either
+take what you need --- ``from module import name1, name2``, or keep them in the
+module and access on a per-need basis ---  ``import module;print module.name``.
+
+
+When It Is Just Fine
+^^^^^^^^^^^^^^^^^^^^
+
+There are situations in which ``from module import *`` is just fine:
+
+* The interactive prompt. For example, ``from math import *`` makes Python an
+  amazing scientific calculator.
+
+* When extending a module in C with a module in Python.
+
+* When the module advertises itself as ``from import *`` safe.
+
+
+Unadorned :keyword:`exec` and friends
+-------------------------------------
+
+The word "unadorned" refers to the use without an explicit dictionary, in which
+case those constructs evaluate code in the *current* environment. This is
+dangerous for the same reasons ``from import *`` is dangerous --- it might step
+over variables you are counting on and mess up things for the rest of your code.
+Simply do not do that.
+
+Bad examples::
+
+   >>> for name in sys.argv[1:]:
+   >>>     exec "%s=1" % name
+   >>> def func(s, **kw):
+   >>>     for var, val in kw.items():
+   >>>         exec "s.%s=val" % var  # invalid!
+   >>> exec(open("handler.py").read())
+   >>> handle()
+
+Good examples::
+
+   >>> d = {}
+   >>> for name in sys.argv[1:]:
+   >>>     d[name] = 1
+   >>> def func(s, **kw):
+   >>>     for var, val in kw.items():
+   >>>         setattr(s, var, val)
+   >>> d={}
+   >>> exec(open("handle.py").read(), d, d)
+   >>> handle = d['handle']
+   >>> handle()
+
+
+from module import name1, name2
+-------------------------------
+
+This is a "don't" which is much weaker then the previous "don't"s but is still
+something you should not do if you don't have good reasons to do that. The
+reason it is usually bad idea is because you suddenly have an object which lives
+in two seperate namespaces. When the binding in one namespace changes, the
+binding in the other will not, so there will be a discrepancy between them. This
+happens when, for example, one module is reloaded, or changes the definition of
+a function at runtime.
+
+Bad example::
+
+   # foo.py
+   a = 1
+
+   # bar.py
+   from foo import a
+   if something():
+       a = 2 # danger: foo.a != a 
+
+Good example::
+
+   # foo.py
+   a = 1
+
+   # bar.py
+   import foo
+   if something():
+       foo.a = 2
+
+
+except:
+-------
+
+Python has the ``except:`` clause, which catches all exceptions. Since *every*
+error in Python raises an exception, this makes many programming errors look
+like runtime problems, and hinders the debugging process.
+
+The following code shows a great example::
+
+   try:
+       foo = opne("file") # misspelled "open"
+   except:
+       sys.exit("could not open file!")
+
+The second line triggers a :exc:`NameError` which is caught by the except
+clause. The program will exit, and you will have no idea that this has nothing
+to do with the readability of ``"file"``.
+
+The example above is better written ::
+
+   try:
+       foo = opne("file") # will be changed to "open" as soon as we run it
+   except IOError:
+       sys.exit("could not open file")
+
+There are some situations in which the ``except:`` clause is useful: for
+example, in a framework when running callbacks, it is good not to let any
+callback disturb the framework.
+
+
+Exceptions
+==========
+
+Exceptions are a useful feature of Python. You should learn to raise them
+whenever something unexpected occurs, and catch them only where you can do
+something about them.
+
+The following is a very popular anti-idiom ::
+
+   def get_status(file):
+       if not os.path.exists(file):
+           print "file not found"
+           sys.exit(1)
+       return open(file).readline()
+
+Consider the case the file gets deleted between the time the call to
+:func:`os.path.exists` is made and the time :func:`open` is called. That means
+the last line will throw an :exc:`IOError`. The same would happen if *file*
+exists but has no read permission. Since testing this on a normal machine on
+existing and non-existing files make it seem bugless, that means in testing the
+results will seem fine, and the code will get shipped. Then an unhandled
+:exc:`IOError` escapes to the user, who has to watch the ugly traceback.
+
+Here is a better way to do it. ::
+
+   def get_status(file):
+       try:
+           return open(file).readline()
+       except (IOError, OSError):
+           print "file not found"
+           sys.exit(1)
+
+In this version, \*either\* the file gets opened and the line is read (so it
+works even on flaky NFS or SMB connections), or the message is printed and the
+application aborted.
+
+Still, :func:`get_status` makes too many assumptions --- that it will only be
+used in a short running script, and not, say, in a long running server. Sure,
+the caller could do something like ::
+
+   try:
+       status = get_status(log)
+   except SystemExit:
+       status = None
+
+So, try to make as few ``except`` clauses in your code --- those will usually be
+a catch-all in the :func:`main`, or inside calls which should always succeed.
+
+So, the best version is probably ::
+
+   def get_status(file):
+       return open(file).readline()
+
+The caller can deal with the exception if it wants (for example, if it  tries
+several files in a loop), or just let the exception filter upwards to *its*
+caller.
+
+The last version is not very good either --- due to implementation details, the
+file would not be closed when an exception is raised until the handler finishes,
+and perhaps not at all in non-C implementations (e.g., Jython). ::
+
+   def get_status(file):
+       fp = open(file)
+       try:
+           return fp.readline()
+       finally:
+           fp.close()
+
+
+Using the Batteries
+===================
+
+Every so often, people seem to be writing stuff in the Python library again,
+usually poorly. While the occasional module has a poor interface, it is usually
+much better to use the rich standard library and data types that come with
+Python then inventing your own.
+
+A useful module very few people know about is :mod:`os.path`. It  always has the
+correct path arithmetic for your operating system, and will usually be much
+better then whatever you come up with yourself.
+
+Compare::
+
+   # ugh!
+   return dir+"/"+file
+   # better
+   return os.path.join(dir, file)
+
+More useful functions in :mod:`os.path`: :func:`basename`,  :func:`dirname` and
+:func:`splitext`.
+
+There are also many useful builtin functions people seem not to be aware of for
+some reason: :func:`min` and :func:`max` can find the minimum/maximum of any
+sequence with comparable semantics, for example, yet many people write their own
+:func:`max`/:func:`min`. Another highly useful function is :func:`reduce`. A
+classical use of :func:`reduce` is something like ::
+
+   import sys, operator
+   nums = map(float, sys.argv[1:])
+   print reduce(operator.add, nums)/len(nums)
+
+This cute little script prints the average of all numbers given on the command
+line. The :func:`reduce` adds up all the numbers, and the rest is just some
+pre- and postprocessing.
+
+On the same note, note that :func:`float`, :func:`int` and :func:`long` all
+accept arguments of type string, and so are suited to parsing --- assuming you
+are ready to deal with the :exc:`ValueError` they raise.
+
+
+Using Backslash to Continue Statements
+======================================
+
+Since Python treats a newline as a statement terminator, and since statements
+are often more then is comfortable to put in one line, many people do::
+
+   if foo.bar()['first'][0] == baz.quux(1, 2)[5:9] and \
+      calculate_number(10, 20) != forbulate(500, 360):
+         pass
+
+You should realize that this is dangerous: a stray space after the ``XXX`` would
+make this line wrong, and stray spaces are notoriously hard to see in editors.
+In this case, at least it would be a syntax error, but if the code was::
+
+   value = foo.bar()['first'][0]*baz.quux(1, 2)[5:9] \
+           + calculate_number(10, 20)*forbulate(500, 360)
+
+then it would just be subtly wrong.
+
+It is usually much better to use the implicit continuation inside parenthesis:
+
+This version is bulletproof::
+
+   value = (foo.bar()['first'][0]*baz.quux(1, 2)[5:9] 
+           + calculate_number(10, 20)*forbulate(500, 360))
+
--- a/Doc/howto/functional.rst
+++ b/Doc/howto/functional.rst
--- a/Doc/howto/index.rst
+++ b/Doc/howto/index.rst
@ -0,0 +1,25 @@
+***************
+ Python HOWTOs
+***************
+
+Python HOWTOs are documents that cover a single, specific topic,
+and attempt to cover it fairly completely. Modelled on the Linux
+Documentation Project's HOWTO collection, this collection is an
+effort to foster documentation that's more detailed than the
+Python Library Reference.
+
+Currently, the HOWTOs are:
+
+.. toctree::
+   :maxdepth: 1
+
+   advocacy.rst
+   pythonmac.rst
+   curses.rst
+   doanddont.rst
+   functional.rst
+   regex.rst
+   sockets.rst
+   unicode.rst
+   urllib2.rst
+
--- a/Doc/howto/pythonmac.rst
+++ b/Doc/howto/pythonmac.rst
@ -0,0 +1,202 @@
+
+.. _using-on-mac:
+
+***************************
+Using Python on a Macintosh
+***************************
+
+:Author: Bob Savage <bobsavage@mac.com>
+
+
+Python on a Macintosh running Mac OS X is in principle very similar to Python on
+any other Unix platform, but there are a number of additional features such as
+the IDE and the Package Manager that are worth pointing out.
+
+The Mac-specific modules are documented in :ref:`mac-specific-services`.
+
+Python on Mac OS 9 or earlier can be quite different from Python on Unix or
+Windows, but is beyond the scope of this manual, as that platform is no longer
+supported, starting with Python 2.4. See http://www.cwi.nl/~jack/macpython for
+installers for the latest 2.3 release for Mac OS 9 and related documentation.
+
+
+.. _getting-osx:
+
+Getting and Installing MacPython
+================================
+
+Mac OS X 10.4 comes with Python 2.3 pre-installed by Apple. However, you are
+encouraged to install the most recent version of Python from the Python website
+(http://www.python.org). A "universal binary" build of Python 2.5, which runs
+natively on the Mac's new Intel and legacy PPC CPU's, is available there.
+
+What you get after installing is a number of things:
+
+* A :file:`MacPython 2.5` folder in your :file:`Applications` folder. In here
+  you find IDLE, the development environment that is a standard part of official
+  Python distributions; PythonLauncher, which handles double-clicking Python
+  scripts from the Finder; and the "Build Applet" tool, which allows you to
+  package Python scripts as standalone applications on your system.
+
+* A framework :file:`/Library/Frameworks/Python.framework`, which includes the
+  Python executable and libraries. The installer adds this location to your shell
+  path. To uninstall MacPython, you can simply remove these three things. A
+  symlink to the Python executable is placed in /usr/local/bin/.
+
+The Apple-provided build of Python is installed in
+:file:`/System/Library/Frameworks/Python.framework` and :file:`/usr/bin/python`,
+respectively. You should never modify or delete these, as they are
+Apple-controlled and are used by Apple- or third-party software.
+
+IDLE includes a help menu that allows you to access Python documentation. If you
+are completely new to Python you should start reading the tutorial introduction
+in that document.
+
+If you are familiar with Python on other Unix platforms you should read the
+section on running Python scripts from the Unix shell.
+
+
+How to run a Python script
+--------------------------
+
+Your best way to get started with Python on Mac OS X is through the IDLE
+integrated development environment, see section :ref:`ide` and use the Help menu
+when the IDE is running.
+
+If you want to run Python scripts from the Terminal window command line or from
+the Finder you first need an editor to create your script. Mac OS X comes with a
+number of standard Unix command line editors, :program:`vim` and
+:program:`emacs` among them. If you want a more Mac-like editor,
+:program:`BBEdit` or :program:`TextWrangler` from Bare Bones Software (see
+http://www.barebones.com/products/bbedit/index.shtml) are good choices, as is
+:program:`TextMate` (see http://macromates.com/). Other editors include
+:program:`Gvim` (http://macvim.org) and :program:`Aquamacs`
+(http://aquamacs.org).
+
+To run your script from the Terminal window you must make sure that
+:file:`/usr/local/bin` is in your shell search path.
+
+To run your script from the Finder you have two options:
+
+* Drag it to :program:`PythonLauncher`
+
+* Select :program:`PythonLauncher` as the default application to open your
+  script (or any .py script) through the finder Info window and double-click it.
+  :program:`PythonLauncher` has various preferences to control how your script is
+  launched. Option-dragging allows you to change these for one invocation, or use
+  its Preferences menu to change things globally.
+
+
+.. _osx-gui-scripts:
+
+Running scripts with a GUI
+--------------------------
+
+With older versions of Python, there is one Mac OS X quirk that you need to be
+aware of: programs that talk to the Aqua window manager (in other words,
+anything that has a GUI) need to be run in a special way. Use :program:`pythonw`
+instead of :program:`python` to start such scripts.
+
+With Python 2.5, you can use either :program:`python` or :program:`pythonw`.
+
+
+Configuration
+-------------
+
+Python on OS X honors all standard Unix environment variables such as
+:envvar:`PYTHONPATH`, but setting these variables for programs started from the
+Finder is non-standard as the Finder does not read your :file:`.profile` or
+:file:`.cshrc` at startup. You need to create a file :file:`~
+/.MacOSX/environment.plist`. See Apple's Technical Document QA1067 for details.
+
+For more information on installation Python packages in MacPython, see section
+:ref:`mac-package-manager`.
+
+
+.. _ide:
+
+The IDE
+=======
+
+MacPython ships with the standard IDLE development environment. A good
+introduction to using IDLE can be found at http://hkn.eecs.berkeley.edu/
+dyoo/python/idle_intro/index.html.
+
+
+.. _mac-package-manager:
+
+Installing Additional Python Packages
+=====================================
+
+There are several methods to install additional Python packages:
+
+* http://pythonmac.org/packages/ contains selected compiled packages for Python
+  2.5, 2.4, and 2.3.
+
+* Packages can be installed via the standard Python distutils mode (``python
+  setup.py install``).
+
+* Many packages can also be installed via the :program:`setuptools` extension.
+
+
+GUI Programming on the Mac
+==========================
+
+There are several options for building GUI applications on the Mac with Python.
+
+*PyObjC* is a Python binding to Apple's Objective-C/Cocoa framework, which is
+the foundation of most modern Mac development. Information on PyObjC is
+available from http://pyobjc.sourceforge.net.
+
+The standard Python GUI toolkit is :mod:`Tkinter`, based on the cross-platform
+Tk toolkit (http://www.tcl.tk). An Aqua-native version of Tk is bundled with OS
+X by Apple, and the latest version can be downloaded and installed from
+http://www.activestate.com; it can also be built from source.
+
+*wxPython* is another popular cross-platform GUI toolkit that runs natively on
+Mac OS X. Packages and documentation are available from http://www.wxpython.org.
+
+*PyQt* is another popular cross-platform GUI toolkit that runs natively on Mac
+OS X. More information can be found at
+http://www.riverbankcomputing.co.uk/pyqt/.
+
+
+Distributing Python Applications on the Mac
+===========================================
+
+The "Build Applet" tool that is placed in the MacPython 2.5 folder is fine for
+packaging small Python scripts on your own machine to run as a standard Mac
+application. This tool, however, is not robust enough to distribute Python
+applications to other users.
+
+The standard tool for deploying standalone Python applications on the Mac is
+:program:`py2app`. More information on installing and using py2app can be found
+at http://undefined.org/python/#py2app.
+
+
+Application Scripting
+=====================
+
+Python can also be used to script other Mac applications via Apple's Open
+Scripting Architecture (OSA); see http://appscript.sourceforge.net. Appscript is
+a high-level, user-friendly Apple event bridge that allows you to control
+scriptable Mac OS X applications using ordinary Python scripts. Appscript makes
+Python a serious alternative to Apple's own *AppleScript* language for
+automating your Mac. A related package, *PyOSA*, is an OSA language component
+for the Python scripting language, allowing Python code to be executed by any
+OSA-enabled application (Script Editor, Mail, iTunes, etc.). PyOSA makes Python
+a full peer to AppleScript.
+
+
+Other Resources
+===============
+
+The MacPython mailing list is an excellent support resource for Python users and
+developers on the Mac:
+
+http://www.python.org/community/sigs/current/pythonmac-sig/
+
+Another useful resource is the MacPython wiki:
+
+http://wiki.python.org/moin/MacPython
+
--- a/Doc/howto/regex.rst
+++ b/Doc/howto/regex.rst
--- a/Doc/howto/sockets.rst
+++ b/Doc/howto/sockets.rst
@ -0,0 +1,421 @@
+****************************
+  Socket Programming HOWTO  
+****************************
+
+:Author: Gordon McMillan
+
+
+.. topic:: Abstract
+
+   Sockets are used nearly everywhere, but are one of the most severely
+   misunderstood technologies around. This is a 10,000 foot overview of sockets.
+   It's not really a tutorial - you'll still have work to do in getting things
+   operational. It doesn't cover the fine points (and there are a lot of them), but
+   I hope it will give you enough background to begin using them decently.
+
+
+Sockets
+=======
+
+Sockets are used nearly everywhere, but are one of the most severely
+misunderstood technologies around. This is a 10,000 foot overview of sockets.
+It's not really a tutorial - you'll still have work to do in getting things
+working. It doesn't cover the fine points (and there are a lot of them), but I
+hope it will give you enough background to begin using them decently.
+
+I'm only going to talk about INET sockets, but they account for at least 99% of
+the sockets in use. And I'll only talk about STREAM sockets - unless you really
+know what you're doing (in which case this HOWTO isn't for you!), you'll get
+better behavior and performance from a STREAM socket than anything else. I will
+try to clear up the mystery of what a socket is, as well as some hints on how to
+work with blocking and non-blocking sockets. But I'll start by talking about
+blocking sockets. You'll need to know how they work before dealing with
+non-blocking sockets.
+
+Part of the trouble with understanding these things is that "socket" can mean a
+number of subtly different things, depending on context. So first, let's make a
+distinction between a "client" socket - an endpoint of a conversation, and a
+"server" socket, which is more like a switchboard operator. The client
+application (your browser, for example) uses "client" sockets exclusively; the
+web server it's talking to uses both "server" sockets and "client" sockets.
+
+
+History
+-------
+
+Of the various forms of IPC (*Inter Process Communication*), sockets are by far
+the most popular.  On any given platform, there are likely to be other forms of
+IPC that are faster, but for cross-platform communication, sockets are about the
+only game in town.
+
+They were invented in Berkeley as part of the BSD flavor of Unix. They spread
+like wildfire with the Internet. With good reason --- the combination of sockets
+with INET makes talking to arbitrary machines around the world unbelievably easy
+(at least compared to other schemes).
+
+
+Creating a Socket
+=================
+
+Roughly speaking, when you clicked on the link that brought you to this page,
+your browser did something like the following::
+
+   #create an INET, STREAMing socket
+   s = socket.socket(
+       socket.AF_INET, socket.SOCK_STREAM)
+   #now connect to the web server on port 80 
+   # - the normal http port
+   s.connect(("www.mcmillan-inc.com", 80))
+
+When the ``connect`` completes, the socket ``s`` can now be used to send in a
+request for the text of this page. The same socket will read the reply, and then
+be destroyed. That's right - destroyed. Client sockets are normally only used
+for one exchange (or a small set of sequential exchanges).
+
+What happens in the web server is a bit more complex. First, the web server
+creates a "server socket". ::
+
+   #create an INET, STREAMing socket
+   serversocket = socket.socket(
+       socket.AF_INET, socket.SOCK_STREAM)
+   #bind the socket to a public host, 
+   # and a well-known port
+   serversocket.bind((socket.gethostname(), 80))
+   #become a server socket
+   serversocket.listen(5)
+
+A couple things to notice: we used ``socket.gethostname()`` so that the socket
+would be visible to the outside world. If we had used ``s.bind(('', 80))`` or
+``s.bind(('localhost', 80))`` or ``s.bind(('127.0.0.1', 80))`` we would still
+have a "server" socket, but one that was only visible within the same machine.
+
+A second thing to note: low number ports are usually reserved for "well known"
+services (HTTP, SNMP etc). If you're playing around, use a nice high number (4
+digits).
+
+Finally, the argument to ``listen`` tells the socket library that we want it to
+queue up as many as 5 connect requests (the normal max) before refusing outside
+connections. If the rest of the code is written properly, that should be plenty.
+
+OK, now we have a "server" socket, listening on port 80. Now we enter the
+mainloop of the web server::
+
+   while 1:
+       #accept connections from outside
+       (clientsocket, address) = serversocket.accept()
+       #now do something with the clientsocket
+       #in this case, we'll pretend this is a threaded server
+       ct = client_thread(clientsocket)
+       ct.run()
+
+There's actually 3 general ways in which this loop could work - dispatching a
+thread to handle ``clientsocket``, create a new process to handle
+``clientsocket``, or restructure this app to use non-blocking sockets, and
+mulitplex between our "server" socket and any active ``clientsocket``\ s using
+``select``. More about that later. The important thing to understand now is
+this: this is *all* a "server" socket does. It doesn't send any data. It doesn't
+receive any data. It just produces "client" sockets. Each ``clientsocket`` is
+created in response to some *other* "client" socket doing a ``connect()`` to the
+host and port we're bound to. As soon as we've created that ``clientsocket``, we
+go back to listening for more connections. The two "clients" are free to chat it
+up - they are using some dynamically allocated port which will be recycled when
+the conversation ends.
+
+
+IPC
+---
+
+If you need fast IPC between two processes on one machine, you should look into
+whatever form of shared memory the platform offers. A simple protocol based
+around shared memory and locks or semaphores is by far the fastest technique.
+
+If you do decide to use sockets, bind the "server" socket to ``'localhost'``. On
+most platforms, this will take a shortcut around a couple of layers of network
+code and be quite a bit faster.
+
+
+Using a Socket
+==============
+
+The first thing to note, is that the web browser's "client" socket and the web
+server's "client" socket are identical beasts. That is, this is a "peer to peer"
+conversation. Or to put it another way, *as the designer, you will have to
+decide what the rules of etiquette are for a conversation*. Normally, the
+``connect``\ ing socket starts the conversation, by sending in a request, or
+perhaps a signon. But that's a design decision - it's not a rule of sockets.
+
+Now there are two sets of verbs to use for communication. You can use ``send``
+and ``recv``, or you can transform your client socket into a file-like beast and
+use ``read`` and ``write``. The latter is the way Java presents their sockets.
+I'm not going to talk about it here, except to warn you that you need to use
+``flush`` on sockets. These are buffered "files", and a common mistake is to
+``write`` something, and then ``read`` for a reply. Without a ``flush`` in
+there, you may wait forever for the reply, because the request may still be in
+your output buffer.
+
+Now we come the major stumbling block of sockets - ``send`` and ``recv`` operate
+on the network buffers. They do not necessarily handle all the bytes you hand
+them (or expect from them), because their major focus is handling the network
+buffers. In general, they return when the associated network buffers have been
+filled (``send``) or emptied (``recv``). They then tell you how many bytes they
+handled. It is *your* responsibility to call them again until your message has
+been completely dealt with.
+
+When a ``recv`` returns 0 bytes, it means the other side has closed (or is in
+the process of closing) the connection.  You will not receive any more data on
+this connection. Ever.  You may be able to send data successfully; I'll talk
+about that some on the next page.
+
+A protocol like HTTP uses a socket for only one transfer. The client sends a
+request, the reads a reply.  That's it. The socket is discarded. This means that
+a client can detect the end of the reply by receiving 0 bytes.
+
+But if you plan to reuse your socket for further transfers, you need to realize
+that *there is no "EOT" (End of Transfer) on a socket.* I repeat: if a socket
+``send`` or ``recv`` returns after handling 0 bytes, the connection has been
+broken.  If the connection has *not* been broken, you may wait on a ``recv``
+forever, because the socket will *not* tell you that there's nothing more to
+read (for now).  Now if you think about that a bit, you'll come to realize a
+fundamental truth of sockets: *messages must either be fixed length* (yuck), *or
+be delimited* (shrug), *or indicate how long they are* (much better), *or end by
+shutting down the connection*. The choice is entirely yours, (but some ways are
+righter than others).
+
+Assuming you don't want to end the connection, the simplest solution is a fixed
+length message::
+
+   class mysocket:
+       '''demonstration class only 
+         - coded for clarity, not efficiency
+       '''
+
+       def __init__(self, sock=None):
+   	if sock is None:
+   	    self.sock = socket.socket(
+   		socket.AF_INET, socket.SOCK_STREAM)
+   	else:
+   	    self.sock = sock
+
+       def connect(self, host, port):
+   	self.sock.connect((host, port))
+
+       def mysend(self, msg):
+   	totalsent = 0
+   	while totalsent < MSGLEN:
+   	    sent = self.sock.send(msg[totalsent:])
+   	    if sent == 0:
+   		raise RuntimeError, \
+   		    "socket connection broken"
+   	    totalsent = totalsent + sent
+
+       def myreceive(self):
+   	msg = ''
+   	while len(msg) < MSGLEN:
+   	    chunk = self.sock.recv(MSGLEN-len(msg))
+   	    if chunk == '':
+   		raise RuntimeError, \
+   		    "socket connection broken"
+   	    msg = msg + chunk
+   	return msg
+
+The sending code here is usable for almost any messaging scheme - in Python you
+send strings, and you can use ``len()`` to determine its length (even if it has
+embedded ``\0`` characters). It's mostly the receiving code that gets more
+complex. (And in C, it's not much worse, except you can't use ``strlen`` if the
+message has embedded ``\0``\ s.)
+
+The easiest enhancement is to make the first character of the message an
+indicator of message type, and have the type determine the length. Now you have
+two ``recv``\ s - the first to get (at least) that first character so you can
+look up the length, and the second in a loop to get the rest. If you decide to
+go the delimited route, you'll be receiving in some arbitrary chunk size, (4096
+or 8192 is frequently a good match for network buffer sizes), and scanning what
+you've received for a delimiter.
+
+One complication to be aware of: if your conversational protocol allows multiple
+messages to be sent back to back (without some kind of reply), and you pass
+``recv`` an arbitrary chunk size, you may end up reading the start of a
+following message. You'll need to put that aside and hold onto it, until it's
+needed.
+
+Prefixing the message with it's length (say, as 5 numeric characters) gets more
+complex, because (believe it or not), you may not get all 5 characters in one
+``recv``. In playing around, you'll get away with it; but in high network loads,
+your code will very quickly break unless you use two ``recv`` loops - the first
+to determine the length, the second to get the data part of the message. Nasty.
+This is also when you'll discover that ``send`` does not always manage to get
+rid of everything in one pass. And despite having read this, you will eventually
+get bit by it!
+
+In the interests of space, building your character, (and preserving my
+competitive position), these enhancements are left as an exercise for the
+reader. Lets move on to cleaning up.
+
+
+Binary Data
+-----------
+
+It is perfectly possible to send binary data over a socket. The major problem is
+that not all machines use the same formats for binary data. For example, a
+Motorola chip will represent a 16 bit integer with the value 1 as the two hex
+bytes 00 01. Intel and DEC, however, are byte-reversed - that same 1 is 01 00.
+Socket libraries have calls for converting 16 and 32 bit integers - ``ntohl,
+htonl, ntohs, htons`` where "n" means *network* and "h" means *host*, "s" means
+*short* and "l" means *long*. Where network order is host order, these do
+nothing, but where the machine is byte-reversed, these swap the bytes around
+appropriately.
+
+In these days of 32 bit machines, the ascii representation of binary data is
+frequently smaller than the binary representation. That's because a surprising
+amount of the time, all those longs have the value 0, or maybe 1. The string "0"
+would be two bytes, while binary is four. Of course, this doesn't fit well with
+fixed-length messages. Decisions, decisions.
+
+
+Disconnecting
+=============
+
+Strictly speaking, you're supposed to use ``shutdown`` on a socket before you
+``close`` it.  The ``shutdown`` is an advisory to the socket at the other end.
+Depending on the argument you pass it, it can mean "I'm not going to send
+anymore, but I'll still listen", or "I'm not listening, good riddance!".  Most
+socket libraries, however, are so used to programmers neglecting to use this
+piece of etiquette that normally a ``close`` is the same as ``shutdown();
+close()``.  So in most situations, an explicit ``shutdown`` is not needed.
+
+One way to use ``shutdown`` effectively is in an HTTP-like exchange. The client
+sends a request and then does a ``shutdown(1)``. This tells the server "This
+client is done sending, but can still receive."  The server can detect "EOF" by
+a receive of 0 bytes. It can assume it has the complete request.  The server
+sends a reply. If the ``send`` completes successfully then, indeed, the client
+was still receiving.
+
+Python takes the automatic shutdown a step further, and says that when a socket
+is garbage collected, it will automatically do a ``close`` if it's needed. But
+relying on this is a very bad habit. If your socket just disappears without
+doing a ``close``, the socket at the other end may hang indefinitely, thinking
+you're just being slow. *Please* ``close`` your sockets when you're done.
+
+
+When Sockets Die
+----------------
+
+Probably the worst thing about using blocking sockets is what happens when the
+other side comes down hard (without doing a ``close``). Your socket is likely to
+hang. SOCKSTREAM is a reliable protocol, and it will wait a long, long time
+before giving up on a connection. If you're using threads, the entire thread is
+essentially dead. There's not much you can do about it. As long as you aren't
+doing something dumb, like holding a lock while doing a blocking read, the
+thread isn't really consuming much in the way of resources. Do *not* try to kill
+the thread - part of the reason that threads are more efficient than processes
+is that they avoid the overhead associated with the automatic recycling of
+resources. In other words, if you do manage to kill the thread, your whole
+process is likely to be screwed up.
+
+
+Non-blocking Sockets
+====================
+
+If you've understood the preceeding, you already know most of what you need to
+know about the mechanics of using sockets. You'll still use the same calls, in
+much the same ways. It's just that, if you do it right, your app will be almost
+inside-out.
+
+In Python, you use ``socket.setblocking(0)`` to make it non-blocking. In C, it's
+more complex, (for one thing, you'll need to choose between the BSD flavor
+``O_NONBLOCK`` and the almost indistinguishable Posix flavor ``O_NDELAY``, which
+is completely different from ``TCP_NODELAY``), but it's the exact same idea. You
+do this after creating the socket, but before using it. (Actually, if you're
+nuts, you can switch back and forth.)
+
+The major mechanical difference is that ``send``, ``recv``, ``connect`` and
+``accept`` can return without having done anything. You have (of course) a
+number of choices. You can check return code and error codes and generally drive
+yourself crazy. If you don't believe me, try it sometime. Your app will grow
+large, buggy and suck CPU. So let's skip the brain-dead solutions and do it
+right.
+
+Use ``select``.
+
+In C, coding ``select`` is fairly complex. In Python, it's a piece of cake, but
+it's close enough to the C version that if you understand ``select`` in Python,
+you'll have little trouble with it in C. ::
+
+   ready_to_read, ready_to_write, in_error = \
+                  select.select(
+                     potential_readers, 
+                     potential_writers, 
+                     potential_errs, 
+                     timeout)
+
+You pass ``select`` three lists: the first contains all sockets that you might
+want to try reading; the second all the sockets you might want to try writing
+to, and the last (normally left empty) those that you want to check for errors.
+You should note that a socket can go into more than one list. The ``select``
+call is blocking, but you can give it a timeout. This is generally a sensible
+thing to do - give it a nice long timeout (say a minute) unless you have good
+reason to do otherwise.
+
+In return, you will get three lists. They have the sockets that are actually
+readable, writable and in error. Each of these lists is a subset (possbily
+empty) of the corresponding list you passed in. And if you put a socket in more
+than one input list, it will only be (at most) in one output list.
+
+If a socket is in the output readable list, you can be
+as-close-to-certain-as-we-ever-get-in-this-business that a ``recv`` on that
+socket will return *something*. Same idea for the writable list. You'll be able
+to send *something*. Maybe not all you want to, but *something* is better than
+nothing.  (Actually, any reasonably healthy socket will return as writable - it
+just means outbound network buffer space is available.)
+
+If you have a "server" socket, put it in the potential_readers list. If it comes
+out in the readable list, your ``accept`` will (almost certainly) work. If you
+have created a new socket to ``connect`` to someone else, put it in the
+ptoential_writers list. If it shows up in the writable list, you have a decent
+chance that it has connected.
+
+One very nasty problem with ``select``: if somewhere in those input lists of
+sockets is one which has died a nasty death, the ``select`` will fail. You then
+need to loop through every single damn socket in all those lists and do a
+``select([sock],[],[],0)`` until you find the bad one. That timeout of 0 means
+it won't take long, but it's ugly.
+
+Actually, ``select`` can be handy even with blocking sockets. It's one way of
+determining whether you will block - the socket returns as readable when there's
+something in the buffers.  However, this still doesn't help with the problem of
+determining whether the other end is done, or just busy with something else.
+
+**Portability alert**: On Unix, ``select`` works both with the sockets and
+files. Don't try this on Windows. On Windows, ``select`` works with sockets
+only. Also note that in C, many of the more advanced socket options are done
+differently on Windows. In fact, on Windows I usually use threads (which work
+very, very well) with my sockets. Face it, if you want any kind of performance,
+your code will look very different on Windows than on Unix. (I haven't the
+foggiest how you do this stuff on a Mac.)
+
+
+Performance
+-----------
+
+There's no question that the fastest sockets code uses non-blocking sockets and
+select to multiplex them. You can put together something that will saturate a
+LAN connection without putting any strain on the CPU. The trouble is that an app
+written this way can't do much of anything else - it needs to be ready to
+shuffle bytes around at all times.
+
+Assuming that your app is actually supposed to do something more than that,
+threading is the optimal solution, (and using non-blocking sockets will be
+faster than using blocking sockets). Unfortunately, threading support in Unixes
+varies both in API and quality. So the normal Unix solution is to fork a
+subprocess to deal with each connection. The overhead for this is significant
+(and don't do this on Windows - the overhead of process creation is enormous
+there). It also means that unless each subprocess is completely independent,
+you'll need to use another form of IPC, say a pipe, or shared memory and
+semaphores, to communicate between the parent and child processes.
+
+Finally, remember that even though blocking sockets are somewhat slower than
+non-blocking, in many cases they are the "right" solution. After all, if your
+app is driven by the data it receives over a socket, there's not much sense in
+complicating the logic just so your app can wait on ``select`` instead of
+``recv``.
+
--- a/Doc/howto/unicode.rst
+++ b/Doc/howto/unicode.rst
@ -0,0 +1,732 @@
+*****************
+  Unicode HOWTO
+*****************
+
+:Release: 1.02
+
+This HOWTO discusses Python's support for Unicode, and explains various problems
+that people commonly encounter when trying to work with Unicode.
+
+Introduction to Unicode
+=======================
+
+History of Character Codes
+--------------------------
+
+In 1968, the American Standard Code for Information Interchange, better known by
+its acronym ASCII, was standardized.  ASCII defined numeric codes for various
+characters, with the numeric values running from 0 to
+127.  For example, the lowercase letter 'a' is assigned 97 as its code
+value.
+
+ASCII was an American-developed standard, so it only defined unaccented
+characters.  There was an 'e', but no 'é' or 'Í'.  This meant that languages
+which required accented characters couldn't be faithfully represented in ASCII.
+(Actually the missing accents matter for English, too, which contains words such
+as 'naïve' and 'café', and some publications have house styles which require
+spellings such as 'coöperate'.)
+
+For a while people just wrote programs that didn't display accents.  I remember
+looking at Apple ][ BASIC programs, published in French-language publications in
+the mid-1980s, that had lines like these::
+
+	PRINT "FICHER EST COMPLETE."
+	PRINT "CARACTERE NON ACCEPTE."
+
+Those messages should contain accents, and they just look wrong to someone who
+can read French.
+
+In the 1980s, almost all personal computers were 8-bit, meaning that bytes could
+hold values ranging from 0 to 255.  ASCII codes only went up to 127, so some
+machines assigned values between 128 and 255 to accented characters.  Different
+machines had different codes, however, which led to problems exchanging files.
+Eventually various commonly used sets of values for the 128-255 range emerged.
+Some were true standards, defined by the International Standards Organization,
+and some were **de facto** conventions that were invented by one company or
+another and managed to catch on.
+
+255 characters aren't very many.  For example, you can't fit both the accented
+characters used in Western Europe and the Cyrillic alphabet used for Russian
+into the 128-255 range because there are more than 127 such characters.
+
+You could write files using different codes (all your Russian files in a coding
+system called KOI8, all your French files in a different coding system called
+Latin1), but what if you wanted to write a French document that quotes some
+Russian text?  In the 1980s people began to want to solve this problem, and the
+Unicode standardization effort began.
+
+Unicode started out using 16-bit characters instead of 8-bit characters.  16
+bits means you have 2^16 = 65,536 distinct values available, making it possible
+to represent many different characters from many different alphabets; an initial
+goal was to have Unicode contain the alphabets for every single human language.
+It turns out that even 16 bits isn't enough to meet that goal, and the modern
+Unicode specification uses a wider range of codes, 0-1,114,111 (0x10ffff in
+base-16).
+
+There's a related ISO standard, ISO 10646.  Unicode and ISO 10646 were
+originally separate efforts, but the specifications were merged with the 1.1
+revision of Unicode.
+
+(This discussion of Unicode's history is highly simplified.  I don't think the
+average Python programmer needs to worry about the historical details; consult
+the Unicode consortium site listed in the References for more information.)
+
+
+Definitions
+-----------
+
+A **character** is the smallest possible component of a text.  'A', 'B', 'C',
+etc., are all different characters.  So are 'È' and 'Í'.  Characters are
+abstractions, and vary depending on the language or context you're talking
+about.  For example, the symbol for ohms (Ω) is usually drawn much like the
+capital letter omega (Ω) in the Greek alphabet (they may even be the same in
+some fonts), but these are two different characters that have different
+meanings.
+
+The Unicode standard describes how characters are represented by **code
+points**.  A code point is an integer value, usually denoted in base 16.  In the
+standard, a code point is written using the notation U+12ca to mean the
+character with value 0x12ca (4810 decimal).  The Unicode standard contains a lot
+of tables listing characters and their corresponding code points::
+
+	0061    'a'; LATIN SMALL LETTER A
+	0062    'b'; LATIN SMALL LETTER B
+	0063    'c'; LATIN SMALL LETTER C
+        ...
+	007B	'{'; LEFT CURLY BRACKET
+
+Strictly, these definitions imply that it's meaningless to say 'this is
+character U+12ca'.  U+12ca is a code point, which represents some particular
+character; in this case, it represents the character 'ETHIOPIC SYLLABLE WI'.  In
+informal contexts, this distinction between code points and characters will
+sometimes be forgotten.
+
+A character is represented on a screen or on paper by a set of graphical
+elements that's called a **glyph**.  The glyph for an uppercase A, for example,
+is two diagonal strokes and a horizontal stroke, though the exact details will
+depend on the font being used.  Most Python code doesn't need to worry about
+glyphs; figuring out the correct glyph to display is generally the job of a GUI
+toolkit or a terminal's font renderer.
+
+
+Encodings
+---------
+
+To summarize the previous section: a Unicode string is a sequence of code
+points, which are numbers from 0 to 0x10ffff.  This sequence needs to be
+represented as a set of bytes (meaning, values from 0-255) in memory.  The rules
+for translating a Unicode string into a sequence of bytes are called an
+**encoding**.
+
+The first encoding you might think of is an array of 32-bit integers.  In this
+representation, the string "Python" would look like this::
+
+       P           y           t           h           o           n
+    0x50 00 00 00 79 00 00 00 74 00 00 00 68 00 00 00 6f 00 00 00 6e 00 00 00 
+       0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 
+
+This representation is straightforward but using it presents a number of
+problems.
+
+1. It's not portable; different processors order the bytes differently.
+
+2. It's very wasteful of space.  In most texts, the majority of the code points
+   are less than 127, or less than 255, so a lot of space is occupied by zero
+   bytes.  The above string takes 24 bytes compared to the 6 bytes needed for an
+   ASCII representation.  Increased RAM usage doesn't matter too much (desktop
+   computers have megabytes of RAM, and strings aren't usually that large), but
+   expanding our usage of disk and network bandwidth by a factor of 4 is
+   intolerable.
+
+3. It's not compatible with existing C functions such as ``strlen()``, so a new
+   family of wide string functions would need to be used.
+
+4. Many Internet standards are defined in terms of textual data, and can't
+   handle content with embedded zero bytes.
+
+Generally people don't use this encoding, instead choosing other encodings that
+are more efficient and convenient.
+
+Encodings don't have to handle every possible Unicode character, and most
+encodings don't.  For example, Python's default encoding is the 'ascii'
+encoding.  The rules for converting a Unicode string into the ASCII encoding are
+simple; for each code point:
+
+1. If the code point is < 128, each byte is the same as the value of the code
+   point.
+
+2. If the code point is 128 or greater, the Unicode string can't be represented
+   in this encoding.  (Python raises a :exc:`UnicodeEncodeError` exception in this
+   case.)
+
+Latin-1, also known as ISO-8859-1, is a similar encoding.  Unicode code points
+0-255 are identical to the Latin-1 values, so converting to this encoding simply
+requires converting code points to byte values; if a code point larger than 255
+is encountered, the string can't be encoded into Latin-1.
+
+Encodings don't have to be simple one-to-one mappings like Latin-1.  Consider
+IBM's EBCDIC, which was used on IBM mainframes.  Letter values weren't in one
+block: 'a' through 'i' had values from 129 to 137, but 'j' through 'r' were 145
+through 153.  If you wanted to use EBCDIC as an encoding, you'd probably use
+some sort of lookup table to perform the conversion, but this is largely an
+internal detail.
+
+UTF-8 is one of the most commonly used encodings.  UTF stands for "Unicode
+Transformation Format", and the '8' means that 8-bit numbers are used in the
+encoding.  (There's also a UTF-16 encoding, but it's less frequently used than
+UTF-8.)  UTF-8 uses the following rules:
+
+1. If the code point is <128, it's represented by the corresponding byte value.
+2. If the code point is between 128 and 0x7ff, it's turned into two byte values
+   between 128 and 255.
+3. Code points >0x7ff are turned into three- or four-byte sequences, where each
+   byte of the sequence is between 128 and 255.
+    
+UTF-8 has several convenient properties:
+
+1. It can handle any Unicode code point.
+2. A Unicode string is turned into a string of bytes containing no embedded zero
+   bytes.  This avoids byte-ordering issues, and means UTF-8 strings can be
+   processed by C functions such as ``strcpy()`` and sent through protocols that
+   can't handle zero bytes.
+3. A string of ASCII text is also valid UTF-8 text.
+4. UTF-8 is fairly compact; the majority of code points are turned into two
+   bytes, and values less than 128 occupy only a single byte.
+5. If bytes are corrupted or lost, it's possible to determine the start of the
+   next UTF-8-encoded code point and resynchronize.  It's also unlikely that
+   random 8-bit data will look like valid UTF-8.
+
+
+
+References
+----------
+
+The Unicode Consortium site at <http://www.unicode.org> has character charts, a
+glossary, and PDF versions of the Unicode specification.  Be prepared for some
+difficult reading.  <http://www.unicode.org/history/> is a chronology of the
+origin and development of Unicode.
+
+To help understand the standard, Jukka Korpela has written an introductory guide
+to reading the Unicode character tables, available at
+<http://www.cs.tut.fi/~jkorpela/unicode/guide.html>.
+
+Roman Czyborra wrote another explanation of Unicode's basic principles; it's at
+<http://czyborra.com/unicode/characters.html>.  Czyborra has written a number of
+other Unicode-related documentation, available from <http://www.cyzborra.com>.
+
+Two other good introductory articles were written by Joel Spolsky
+<http://www.joelonsoftware.com/articles/Unicode.html> and Jason Orendorff
+<http://www.jorendorff.com/articles/unicode/>.  If this introduction didn't make
+things clear to you, you should try reading one of these alternate articles
+before continuing.
+
+Wikipedia entries are often helpful; see the entries for "character encoding"
+<http://en.wikipedia.org/wiki/Character_encoding> and UTF-8
+<http://en.wikipedia.org/wiki/UTF-8>, for example.
+
+
+Python's Unicode Support
+========================
+
+Now that you've learned the rudiments of Unicode, we can look at Python's
+Unicode features.
+
+
+The Unicode Type
+----------------
+
+Unicode strings are expressed as instances of the :class:`unicode` type, one of
+Python's repertoire of built-in types.  It derives from an abstract type called
+:class:`basestring`, which is also an ancestor of the :class:`str` type; you can
+therefore check if a value is a string type with ``isinstance(value,
+basestring)``.  Under the hood, Python represents Unicode strings as either 16-
+or 32-bit integers, depending on how the Python interpreter was compiled.
+
+The :func:`unicode` constructor has the signature ``unicode(string[, encoding,
+errors])``.  All of its arguments should be 8-bit strings.  The first argument
+is converted to Unicode using the specified encoding; if you leave off the
+``encoding`` argument, the ASCII encoding is used for the conversion, so
+characters greater than 127 will be treated as errors::
+
+    >>> unicode('abcdef')
+    u'abcdef'
+    >>> s = unicode('abcdef')
+    >>> type(s)
+    <type 'unicode'>
+    >>> unicode('abcdef' + chr(255))
+    Traceback (most recent call last):
+      File "<stdin>", line 1, in ?
+    UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 6: 
+                        ordinal not in range(128)
+
+The ``errors`` argument specifies the response when the input string can't be
+converted according to the encoding's rules.  Legal values for this argument are
+'strict' (raise a ``UnicodeDecodeError`` exception), 'replace' (add U+FFFD,
+'REPLACEMENT CHARACTER'), or 'ignore' (just leave the character out of the
+Unicode result).  The following examples show the differences::
+
+    >>> unicode('\x80abc', errors='strict')
+    Traceback (most recent call last):
+      File "<stdin>", line 1, in ?
+    UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: 
+                        ordinal not in range(128)
+    >>> unicode('\x80abc', errors='replace')
+    u'\ufffdabc'
+    >>> unicode('\x80abc', errors='ignore')
+    u'abc'
+
+Encodings are specified as strings containing the encoding's name.  Python 2.4
+comes with roughly 100 different encodings; see the Python Library Reference at
+<http://docs.python.org/lib/standard-encodings.html> for a list.  Some encodings
+have multiple names; for example, 'latin-1', 'iso_8859_1' and '8859' are all
+synonyms for the same encoding.
+
+One-character Unicode strings can also be created with the :func:`unichr`
+built-in function, which takes integers and returns a Unicode string of length 1
+that contains the corresponding code point.  The reverse operation is the
+built-in :func:`ord` function that takes a one-character Unicode string and
+returns the code point value::
+
+    >>> unichr(40960)
+    u'\ua000'
+    >>> ord(u'\ua000')
+    40960
+
+Instances of the :class:`unicode` type have many of the same methods as the
+8-bit string type for operations such as searching and formatting::
+
+    >>> s = u'Was ever feather so lightly blown to and fro as this multitude?'
+    >>> s.count('e')
+    5
+    >>> s.find('feather')
+    9
+    >>> s.find('bird')
+    -1
+    >>> s.replace('feather', 'sand')
+    u'Was ever sand so lightly blown to and fro as this multitude?'
+    >>> s.upper()
+    u'WAS EVER FEATHER SO LIGHTLY BLOWN TO AND FRO AS THIS MULTITUDE?'
+
+Note that the arguments to these methods can be Unicode strings or 8-bit
+strings.  8-bit strings will be converted to Unicode before carrying out the
+operation; Python's default ASCII encoding will be used, so characters greater
+than 127 will cause an exception::
+
+    >>> s.find('Was\x9f')
+    Traceback (most recent call last):
+      File "<stdin>", line 1, in ?
+    UnicodeDecodeError: 'ascii' codec can't decode byte 0x9f in position 3: ordinal not in range(128)
+    >>> s.find(u'Was\x9f')
+    -1
+
+Much Python code that operates on strings will therefore work with Unicode
+strings without requiring any changes to the code.  (Input and output code needs
+more updating for Unicode; more on this later.)
+
+Another important method is ``.encode([encoding], [errors='strict'])``, which
+returns an 8-bit string version of the Unicode string, encoded in the requested
+encoding.  The ``errors`` parameter is the same as the parameter of the
+``unicode()`` constructor, with one additional possibility; as well as 'strict',
+'ignore', and 'replace', you can also pass 'xmlcharrefreplace' which uses XML's
+character references.  The following example shows the different results::
+
+    >>> u = unichr(40960) + u'abcd' + unichr(1972)
+    >>> u.encode('utf-8')
+    '\xea\x80\x80abcd\xde\xb4'
+    >>> u.encode('ascii')
+    Traceback (most recent call last):
+      File "<stdin>", line 1, in ?
+    UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in position 0: ordinal not in range(128)
+    >>> u.encode('ascii', 'ignore')
+    'abcd'
+    >>> u.encode('ascii', 'replace')
+    '?abcd?'
+    >>> u.encode('ascii', 'xmlcharrefreplace')
+    '&#40960;abcd&#1972;'
+
+Python's 8-bit strings have a ``.decode([encoding], [errors])`` method that
+interprets the string using the given encoding::
+
+    >>> u = unichr(40960) + u'abcd' + unichr(1972)   # Assemble a string
+    >>> utf8_version = u.encode('utf-8')             # Encode as UTF-8
+    >>> type(utf8_version), utf8_version
+    (<type 'str'>, '\xea\x80\x80abcd\xde\xb4')
+    >>> u2 = utf8_version.decode('utf-8')            # Decode using UTF-8
+    >>> u == u2                                      # The two strings match
+    True
+ 
+The low-level routines for registering and accessing the available encodings are
+found in the :mod:`codecs` module.  However, the encoding and decoding functions
+returned by this module are usually more low-level than is comfortable, so I'm
+not going to describe the :mod:`codecs` module here.  If you need to implement a
+completely new encoding, you'll need to learn about the :mod:`codecs` module
+interfaces, but implementing encodings is a specialized task that also won't be
+covered here.  Consult the Python documentation to learn more about this module.
+
+The most commonly used part of the :mod:`codecs` module is the
+:func:`codecs.open` function which will be discussed in the section on input and
+output.
+            
+            
+Unicode Literals in Python Source Code
+--------------------------------------
+
+In Python source code, Unicode literals are written as strings prefixed with the
+'u' or 'U' character: ``u'abcdefghijk'``.  Specific code points can be written
+using the ``\u`` escape sequence, which is followed by four hex digits giving
+the code point.  The ``\U`` escape sequence is similar, but expects 8 hex
+digits, not 4.
+
+Unicode literals can also use the same escape sequences as 8-bit strings,
+including ``\x``, but ``\x`` only takes two hex digits so it can't express an
+arbitrary code point.  Octal escapes can go up to U+01ff, which is octal 777.
+
+::
+
+    >>> s = u"a\xac\u1234\u20ac\U00008000"
+               ^^^^ two-digit hex escape
+                   ^^^^^^ four-digit Unicode escape 
+                               ^^^^^^^^^^ eight-digit Unicode escape
+    >>> for c in s:  print ord(c),
+    ... 
+    97 172 4660 8364 32768
+
+Using escape sequences for code points greater than 127 is fine in small doses,
+but becomes an annoyance if you're using many accented characters, as you would
+in a program with messages in French or some other accent-using language.  You
+can also assemble strings using the :func:`unichr` built-in function, but this is
+even more tedious.
+
+Ideally, you'd want to be able to write literals in your language's natural
+encoding.  You could then edit Python source code with your favorite editor
+which would display the accented characters naturally, and have the right
+characters used at runtime.
+
+Python supports writing Unicode literals in any encoding, but you have to
+declare the encoding being used.  This is done by including a special comment as
+either the first or second line of the source file::
+
+    #!/usr/bin/env python
+    # -*- coding: latin-1 -*-
+    
+    u = u'abcdé'
+    print ord(u[-1])
+    
+The syntax is inspired by Emacs's notation for specifying variables local to a
+file.  Emacs supports many different variables, but Python only supports
+'coding'.  The ``-*-`` symbols indicate that the comment is special; within
+them, you must supply the name ``coding`` and the name of your chosen encoding,
+separated by ``':'``.
+
+If you don't include such a comment, the default encoding used will be ASCII.
+Versions of Python before 2.4 were Euro-centric and assumed Latin-1 as a default
+encoding for string literals; in Python 2.4, characters greater than 127 still
+work but result in a warning.  For example, the following program has no
+encoding declaration::
+
+    #!/usr/bin/env python
+    u = u'abcdé'
+    print ord(u[-1])
+
+When you run it with Python 2.4, it will output the following warning::
+
+    amk:~$ python p263.py
+    sys:1: DeprecationWarning: Non-ASCII character '\xe9' 
+         in file p263.py on line 2, but no encoding declared; 
+         see http://www.python.org/peps/pep-0263.html for details
+  
+
+Unicode Properties
+------------------
+
+The Unicode specification includes a database of information about code points.
+For each code point that's defined, the information includes the character's
+name, its category, the numeric value if applicable (Unicode has characters
+representing the Roman numerals and fractions such as one-third and
+four-fifths).  There are also properties related to the code point's use in
+bidirectional text and other display-related properties.
+
+The following program displays some information about several characters, and
+prints the numeric value of one particular character::
+
+    import unicodedata
+    
+    u = unichr(233) + unichr(0x0bf2) + unichr(3972) + unichr(6000) + unichr(13231)
+    
+    for i, c in enumerate(u):
+        print i, '%04x' % ord(c), unicodedata.category(c),
+        print unicodedata.name(c)
+    
+    # Get numeric value of second character
+    print unicodedata.numeric(u[1])
+
+When run, this prints::
+
+    0 00e9 Ll LATIN SMALL LETTER E WITH ACUTE
+    1 0bf2 No TAMIL NUMBER ONE THOUSAND
+    2 0f84 Mn TIBETAN MARK HALANTA
+    3 1770 Lo TAGBANWA LETTER SA
+    4 33af So SQUARE RAD OVER S SQUARED
+    1000.0
+
+The category codes are abbreviations describing the nature of the character.
+These are grouped into categories such as "Letter", "Number", "Punctuation", or
+"Symbol", which in turn are broken up into subcategories.  To take the codes
+from the above output, ``'Ll'`` means 'Letter, lowercase', ``'No'`` means
+"Number, other", ``'Mn'`` is "Mark, nonspacing", and ``'So'`` is "Symbol,
+other".  See
+<http://www.unicode.org/Public/UNIDATA/UCD.html#General_Category_Values> for a
+list of category codes.
+
+References
+----------
+
+The Unicode and 8-bit string types are described in the Python library reference
+at :ref:`typesseq`.
+
+The documentation for the :mod:`unicodedata` module.
+
+The documentation for the :mod:`codecs` module.
+
+Marc-André Lemburg gave a presentation at EuroPython 2002 titled "Python and
+Unicode".  A PDF version of his slides is available at
+<http://www.egenix.com/files/python/Unicode-EPC2002-Talk.pdf>, and is an
+excellent overview of the design of Python's Unicode features.
+
+
+Reading and Writing Unicode Data
+================================
+
+Once you've written some code that works with Unicode data, the next problem is
+input/output.  How do you get Unicode strings into your program, and how do you
+convert Unicode into a form suitable for storage or transmission?
+
+It's possible that you may not need to do anything depending on your input
+sources and output destinations; you should check whether the libraries used in
+your application support Unicode natively.  XML parsers often return Unicode
+data, for example.  Many relational databases also support Unicode-valued
+columns and can return Unicode values from an SQL query.
+
+Unicode data is usually converted to a particular encoding before it gets
+written to disk or sent over a socket.  It's possible to do all the work
+yourself: open a file, read an 8-bit string from it, and convert the string with
+``unicode(str, encoding)``.  However, the manual approach is not recommended.
+
+One problem is the multi-byte nature of encodings; one Unicode character can be
+represented by several bytes.  If you want to read the file in arbitrary-sized
+chunks (say, 1K or 4K), you need to write error-handling code to catch the case
+where only part of the bytes encoding a single Unicode character are read at the
+end of a chunk.  One solution would be to read the entire file into memory and
+then perform the decoding, but that prevents you from working with files that
+are extremely large; if you need to read a 2Gb file, you need 2Gb of RAM.
+(More, really, since for at least a moment you'd need to have both the encoded
+string and its Unicode version in memory.)
+
+The solution would be to use the low-level decoding interface to catch the case
+of partial coding sequences.  The work of implementing this has already been
+done for you: the :mod:`codecs` module includes a version of the :func:`open`
+function that returns a file-like object that assumes the file's contents are in
+a specified encoding and accepts Unicode parameters for methods such as
+``.read()`` and ``.write()``.
+
+The function's parameters are ``open(filename, mode='rb', encoding=None,
+errors='strict', buffering=1)``.  ``mode`` can be ``'r'``, ``'w'``, or ``'a'``,
+just like the corresponding parameter to the regular built-in ``open()``
+function; add a ``'+'`` to update the file.  ``buffering`` is similarly parallel
+to the standard function's parameter.  ``encoding`` is a string giving the
+encoding to use; if it's left as ``None``, a regular Python file object that
+accepts 8-bit strings is returned.  Otherwise, a wrapper object is returned, and
+data written to or read from the wrapper object will be converted as needed.
+``errors`` specifies the action for encoding errors and can be one of the usual
+values of 'strict', 'ignore', and 'replace'.
+
+Reading Unicode from a file is therefore simple::
+
+    import codecs
+    f = codecs.open('unicode.rst', encoding='utf-8')
+    for line in f:
+        print repr(line)
+
+It's also possible to open files in update mode, allowing both reading and
+writing::
+
+    f = codecs.open('test', encoding='utf-8', mode='w+')
+    f.write(u'\u4500 blah blah blah\n')
+    f.seek(0)
+    print repr(f.readline()[:1])
+    f.close()
+
+Unicode character U+FEFF is used as a byte-order mark (BOM), and is often
+written as the first character of a file in order to assist with autodetection
+of the file's byte ordering.  Some encodings, such as UTF-16, expect a BOM to be
+present at the start of a file; when such an encoding is used, the BOM will be
+automatically written as the first character and will be silently dropped when
+the file is read.  There are variants of these encodings, such as 'utf-16-le'
+and 'utf-16-be' for little-endian and big-endian encodings, that specify one
+particular byte ordering and don't skip the BOM.
+
+
+Unicode filenames
+-----------------
+
+Most of the operating systems in common use today support filenames that contain
+arbitrary Unicode characters.  Usually this is implemented by converting the
+Unicode string into some encoding that varies depending on the system.  For
+example, MacOS X uses UTF-8 while Windows uses a configurable encoding; on
+Windows, Python uses the name "mbcs" to refer to whatever the currently
+configured encoding is.  On Unix systems, there will only be a filesystem
+encoding if you've set the ``LANG`` or ``LC_CTYPE`` environment variables; if
+you haven't, the default encoding is ASCII.
+
+The :func:`sys.getfilesystemencoding` function returns the encoding to use on
+your current system, in case you want to do the encoding manually, but there's
+not much reason to bother.  When opening a file for reading or writing, you can
+usually just provide the Unicode string as the filename, and it will be
+automatically converted to the right encoding for you::
+
+    filename = u'filename\u4500abc'
+    f = open(filename, 'w')
+    f.write('blah\n')
+    f.close()
+
+Functions in the :mod:`os` module such as :func:`os.stat` will also accept Unicode
+filenames.
+
+:func:`os.listdir`, which returns filenames, raises an issue: should it return
+the Unicode version of filenames, or should it return 8-bit strings containing
+the encoded versions?  :func:`os.listdir` will do both, depending on whether you
+provided the directory path as an 8-bit string or a Unicode string.  If you pass
+a Unicode string as the path, filenames will be decoded using the filesystem's
+encoding and a list of Unicode strings will be returned, while passing an 8-bit
+path will return the 8-bit versions of the filenames.  For example, assuming the
+default filesystem encoding is UTF-8, running the following program::
+
+	fn = u'filename\u4500abc'
+	f = open(fn, 'w')
+	f.close()
+
+	import os
+	print os.listdir('.')
+	print os.listdir(u'.')
+
+will produce the following output::
+
+	amk:~$ python t.py
+	['.svn', 'filename\xe4\x94\x80abc', ...]
+	[u'.svn', u'filename\u4500abc', ...]
+
+The first list contains UTF-8-encoded filenames, and the second list contains
+the Unicode versions.
+
+
+	
+Tips for Writing Unicode-aware Programs
+---------------------------------------
+
+This section provides some suggestions on writing software that deals with
+Unicode.
+
+The most important tip is:
+
+    Software should only work with Unicode strings internally, converting to a
+    particular encoding on output.
+
+If you attempt to write processing functions that accept both Unicode and 8-bit
+strings, you will find your program vulnerable to bugs wherever you combine the
+two different kinds of strings.  Python's default encoding is ASCII, so whenever
+a character with an ASCII value > 127 is in the input data, you'll get a
+:exc:`UnicodeDecodeError` because that character can't be handled by the ASCII
+encoding.
+
+It's easy to miss such problems if you only test your software with data that
+doesn't contain any accents; everything will seem to work, but there's actually
+a bug in your program waiting for the first user who attempts to use characters
+> 127.  A second tip, therefore, is:
+
+    Include characters > 127 and, even better, characters > 255 in your test
+    data.
+
+When using data coming from a web browser or some other untrusted source, a
+common technique is to check for illegal characters in a string before using the
+string in a generated command line or storing it in a database.  If you're doing
+this, be careful to check the string once it's in the form that will be used or
+stored; it's possible for encodings to be used to disguise characters.  This is
+especially true if the input data also specifies the encoding; many encodings
+leave the commonly checked-for characters alone, but Python includes some
+encodings such as ``'base64'`` that modify every single character.
+
+For example, let's say you have a content management system that takes a Unicode
+filename, and you want to disallow paths with a '/' character.  You might write
+this code::
+
+    def read_file (filename, encoding):
+        if '/' in filename:
+            raise ValueError("'/' not allowed in filenames")
+        unicode_name = filename.decode(encoding)
+        f = open(unicode_name, 'r')
+        # ... return contents of file ...
+        
+However, if an attacker could specify the ``'base64'`` encoding, they could pass
+``'L2V0Yy9wYXNzd2Q='``, which is the base-64 encoded form of the string
+``'/etc/passwd'``, to read a system file.  The above code looks for ``'/'``
+characters in the encoded form and misses the dangerous character in the
+resulting decoded form.
+
+References
+----------
+
+The PDF slides for Marc-André Lemburg's presentation "Writing Unicode-aware
+Applications in Python" are available at
+<http://www.egenix.com/files/python/LSM2005-Developing-Unicode-aware-applications-in-Python.pdf>
+and discuss questions of character encodings as well as how to internationalize
+and localize an application.
+
+
+Revision History and Acknowledgements
+=====================================
+
+Thanks to the following people who have noted errors or offered suggestions on
+this article: Nicholas Bastin, Marius Gedminas, Kent Johnson, Ken Krugler,
+Marc-André Lemburg, Martin von Löwis, Chad Whitacre.
+
+Version 1.0: posted August 5 2005.
+
+Version 1.01: posted August 7 2005.  Corrects factual and markup errors; adds
+several links.
+
+Version 1.02: posted August 16 2005.  Corrects factual errors.
+
+
+.. comment Additional topic: building Python w/ UCS2 or UCS4 support
+.. comment Describe obscure -U switch somewhere?
+.. comment Describe use of codecs.StreamRecoder and StreamReaderWriter
+
+.. comment 
+   Original outline:
+
+   - [ ] Unicode introduction
+       - [ ] ASCII
+       - [ ] Terms
+	   - [ ] Character
+	   - [ ] Code point
+	 - [ ] Encodings
+	    - [ ] Common encodings: ASCII, Latin-1, UTF-8
+       - [ ] Unicode Python type
+	   - [ ] Writing unicode literals
+	       - [ ] Obscurity: -U switch
+	   - [ ] Built-ins
+	       - [ ] unichr()
+	       - [ ] ord()
+	       - [ ] unicode() constructor
+	   - [ ] Unicode type
+	       - [ ] encode(), decode() methods
+       - [ ] Unicodedata module for character properties
+       - [ ] I/O
+	   - [ ] Reading/writing Unicode data into files
+	       - [ ] Byte-order marks
+	   - [ ] Unicode filenames
+       - [ ] Writing Unicode programs
+	   - [ ] Do everything in Unicode
+	   - [ ] Declaring source code encodings (PEP 263)
+       - [ ] Other issues
+	   - [ ] Building Python (UCS2, UCS4)
--- a/Doc/howto/urllib2.rst
+++ b/Doc/howto/urllib2.rst
@ -0,0 +1,578 @@
+************************************************
+  HOWTO Fetch Internet Resources Using urllib2
+************************************************
+
+:Author: `Michael Foord <http://www.voidspace.org.uk/python/index.shtml>`_
+
+.. note::
+
+    There is an French translation of an earlier revision of this
+    HOWTO, available at `urllib2 - Le Manuel manquant
+    <http://www.voidspace/python/articles/urllib2_francais.shtml>`_.
+
+ 
+
+Introduction
+============
+
+.. sidebar:: Related Articles
+
+    You may also find useful the following article on fetching web resources
+    with Python :
+    
+    * `Basic Authentication <http://www.voidspace.org.uk/python/articles/authentication.shtml>`_
+    
+        A tutorial on *Basic Authentication*, with examples in Python.
+
+**urllib2** is a `Python <http://www.python.org>`_ module for fetching URLs
+(Uniform Resource Locators). It offers a very simple interface, in the form of
+the *urlopen* function. This is capable of fetching URLs using a variety of
+different protocols. It also offers a slightly more complex interface for
+handling common situations - like basic authentication, cookies, proxies and so
+on. These are provided by objects called handlers and openers.
+
+urllib2 supports fetching URLs for many "URL schemes" (identified by the string
+before the ":" in URL - for example "ftp" is the URL scheme of
+"ftp://python.org/") using their associated network protocols (e.g. FTP, HTTP).
+This tutorial focuses on the most common case, HTTP.
+
+For straightforward situations *urlopen* is very easy to use. But as soon as you
+encounter errors or non-trivial cases when opening HTTP URLs, you will need some
+understanding of the HyperText Transfer Protocol. The most comprehensive and
+authoritative reference to HTTP is :rfc:`2616`. This is a technical document and
+not intended to be easy to read. This HOWTO aims to illustrate using *urllib2*,
+with enough detail about HTTP to help you through. It is not intended to replace
+the :mod:`urllib2` docs, but is supplementary to them.
+
+
+Fetching URLs
+=============
+
+The simplest way to use urllib2 is as follows::
+
+    import urllib2
+    response = urllib2.urlopen('http://python.org/')
+    html = response.read()
+
+Many uses of urllib2 will be that simple (note that instead of an 'http:' URL we
+could have used an URL starting with 'ftp:', 'file:', etc.).  However, it's the
+purpose of this tutorial to explain the more complicated cases, concentrating on
+HTTP.
+
+HTTP is based on requests and responses - the client makes requests and servers
+send responses. urllib2 mirrors this with a ``Request`` object which represents
+the HTTP request you are making. In its simplest form you create a Request
+object that specifies the URL you want to fetch. Calling ``urlopen`` with this
+Request object returns a response object for the URL requested. This response is
+a file-like object, which means you can for example call ``.read()`` on the
+response::
+
+    import urllib2
+
+    req = urllib2.Request('http://www.voidspace.org.uk')
+    response = urllib2.urlopen(req)
+    the_page = response.read()
+
+Note that urllib2 makes use of the same Request interface to handle all URL
+schemes.  For example, you can make an FTP request like so::
+
+    req = urllib2.Request('ftp://example.com/')
+
+In the case of HTTP, there are two extra things that Request objects allow you
+to do: First, you can pass data to be sent to the server.  Second, you can pass
+extra information ("metadata") *about* the data or the about request itself, to
+the server - this information is sent as HTTP "headers".  Let's look at each of
+these in turn.
+
+Data
+----
+
+Sometimes you want to send data to a URL (often the URL will refer to a CGI
+(Common Gateway Interface) script [#]_ or other web application). With HTTP,
+this is often done using what's known as a **POST** request. This is often what
+your browser does when you submit a HTML form that you filled in on the web. Not
+all POSTs have to come from forms: you can use a POST to transmit arbitrary data
+to your own application. In the common case of HTML forms, the data needs to be
+encoded in a standard way, and then passed to the Request object as the ``data``
+argument. The encoding is done using a function from the ``urllib`` library
+*not* from ``urllib2``. ::
+
+    import urllib
+    import urllib2  
+
+    url = 'http://www.someserver.com/cgi-bin/register.cgi'
+    values = {'name' : 'Michael Foord',
+              'location' : 'Northampton',
+              'language' : 'Python' }
+
+    data = urllib.urlencode(values)
+    req = urllib2.Request(url, data)
+    response = urllib2.urlopen(req)
+    the_page = response.read()
+
+Note that other encodings are sometimes required (e.g. for file upload from HTML
+forms - see `HTML Specification, Form Submission
+<http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13>`_ for more
+details).
+
+If you do not pass the ``data`` argument, urllib2 uses a **GET** request. One
+way in which GET and POST requests differ is that POST requests often have
+"side-effects": they change the state of the system in some way (for example by
+placing an order with the website for a hundredweight of tinned spam to be
+delivered to your door).  Though the HTTP standard makes it clear that POSTs are
+intended to *always* cause side-effects, and GET requests *never* to cause
+side-effects, nothing prevents a GET request from having side-effects, nor a
+POST requests from having no side-effects. Data can also be passed in an HTTP
+GET request by encoding it in the URL itself.
+
+This is done as follows::
+
+    >>> import urllib2
+    >>> import urllib
+    >>> data = {}
+    >>> data['name'] = 'Somebody Here'
+    >>> data['location'] = 'Northampton'
+    >>> data['language'] = 'Python'
+    >>> url_values = urllib.urlencode(data)
+    >>> print url_values
+    name=Somebody+Here&language=Python&location=Northampton
+    >>> url = 'http://www.example.com/example.cgi'
+    >>> full_url = url + '?' + url_values
+    >>> data = urllib2.open(full_url)
+
+Notice that the full URL is created by adding a ``?`` to the URL, followed by
+the encoded values.
+
+Headers
+-------
+
+We'll discuss here one particular HTTP header, to illustrate how to add headers
+to your HTTP request.
+
+Some websites [#]_ dislike being browsed by programs, or send different versions
+to different browsers [#]_ . By default urllib2 identifies itself as
+``Python-urllib/x.y`` (where ``x`` and ``y`` are the major and minor version
+numbers of the Python release,
+e.g. ``Python-urllib/2.5``), which may confuse the site, or just plain
+not work. The way a browser identifies itself is through the
+``User-Agent`` header [#]_. When you create a Request object you can
+pass a dictionary of headers in. The following example makes the same
+request as above, but identifies itself as a version of Internet
+Explorer [#]_. ::
+
+    import urllib
+    import urllib2  
+    
+    url = 'http://www.someserver.com/cgi-bin/register.cgi'
+    user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' 
+    values = {'name' : 'Michael Foord',
+              'location' : 'Northampton',
+              'language' : 'Python' }
+    headers = { 'User-Agent' : user_agent }
+    
+    data = urllib.urlencode(values)
+    req = urllib2.Request(url, data, headers)
+    response = urllib2.urlopen(req)
+    the_page = response.read()
+
+The response also has two useful methods. See the section on `info and geturl`_
+which comes after we have a look at what happens when things go wrong.
+
+
+Handling Exceptions
+===================
+
+*urlopen* raises ``URLError`` when it cannot handle a response (though as usual
+with Python APIs, builtin exceptions such as ValueError, TypeError etc. may also
+be raised).
+
+``HTTPError`` is the subclass of ``URLError`` raised in the specific case of
+HTTP URLs.
+
+URLError
+--------
+
+Often, URLError is raised because there is no network connection (no route to
+the specified server), or the specified server doesn't exist.  In this case, the
+exception raised will have a 'reason' attribute, which is a tuple containing an
+error code and a text error message.
+
+e.g. ::
+
+    >>> req = urllib2.Request('http://www.pretend_server.org')
+    >>> try: urllib2.urlopen(req)
+    >>> except URLError, e:
+    >>>    print e.reason
+    >>>
+    (4, 'getaddrinfo failed')
+
+
+HTTPError
+---------
+
+Every HTTP response from the server contains a numeric "status code". Sometimes
+the status code indicates that the server is unable to fulfil the request. The
+default handlers will handle some of these responses for you (for example, if
+the response is a "redirection" that requests the client fetch the document from
+a different URL, urllib2 will handle that for you). For those it can't handle,
+urlopen will raise an ``HTTPError``. Typical errors include '404' (page not
+found), '403' (request forbidden), and '401' (authentication required).
+
+See section 10 of RFC 2616 for a reference on all the HTTP error codes.
+
+The ``HTTPError`` instance raised will have an integer 'code' attribute, which
+corresponds to the error sent by the server.
+
+Error Codes
+~~~~~~~~~~~
+
+Because the default handlers handle redirects (codes in the 300 range), and
+codes in the 100-299 range indicate success, you will usually only see error
+codes in the 400-599 range.
+
+``BaseHTTPServer.BaseHTTPRequestHandler.responses`` is a useful dictionary of
+response codes in that shows all the response codes used by RFC 2616. The
+dictionary is reproduced here for convenience ::
+
+    # Table mapping response codes to messages; entries have the
+    # form {code: (shortmessage, longmessage)}.
+    responses = {
+        100: ('Continue', 'Request received, please continue'),
+        101: ('Switching Protocols',
+              'Switching to new protocol; obey Upgrade header'),
+
+        200: ('OK', 'Request fulfilled, document follows'),
+        201: ('Created', 'Document created, URL follows'),
+        202: ('Accepted',
+              'Request accepted, processing continues off-line'),
+        203: ('Non-Authoritative Information', 'Request fulfilled from cache'),
+        204: ('No Content', 'Request fulfilled, nothing follows'),
+        205: ('Reset Content', 'Clear input form for further input.'),
+        206: ('Partial Content', 'Partial content follows.'),
+
+        300: ('Multiple Choices',
+              'Object has several resources -- see URI list'),
+        301: ('Moved Permanently', 'Object moved permanently -- see URI list'),
+        302: ('Found', 'Object moved temporarily -- see URI list'),
+        303: ('See Other', 'Object moved -- see Method and URL list'),
+        304: ('Not Modified',
+              'Document has not changed since given time'),
+        305: ('Use Proxy',
+              'You must use proxy specified in Location to access this '
+              'resource.'),
+        307: ('Temporary Redirect',
+              'Object moved temporarily -- see URI list'),
+
+        400: ('Bad Request',
+              'Bad request syntax or unsupported method'),
+        401: ('Unauthorized',
+              'No permission -- see authorization schemes'),
+        402: ('Payment Required',
+              'No payment -- see charging schemes'),
+        403: ('Forbidden',
+              'Request forbidden -- authorization will not help'),
+        404: ('Not Found', 'Nothing matches the given URI'),
+        405: ('Method Not Allowed',
+              'Specified method is invalid for this server.'),
+        406: ('Not Acceptable', 'URI not available in preferred format.'),
+        407: ('Proxy Authentication Required', 'You must authenticate with '
+              'this proxy before proceeding.'),
+        408: ('Request Timeout', 'Request timed out; try again later.'),
+        409: ('Conflict', 'Request conflict.'),
+        410: ('Gone',
+              'URI no longer exists and has been permanently removed.'),
+        411: ('Length Required', 'Client must specify Content-Length.'),
+        412: ('Precondition Failed', 'Precondition in headers is false.'),
+        413: ('Request Entity Too Large', 'Entity is too large.'),
+        414: ('Request-URI Too Long', 'URI is too long.'),
+        415: ('Unsupported Media Type', 'Entity body in unsupported format.'),
+        416: ('Requested Range Not Satisfiable',
+              'Cannot satisfy request range.'),
+        417: ('Expectation Failed',
+              'Expect condition could not be satisfied.'),
+
+        500: ('Internal Server Error', 'Server got itself in trouble'),
+        501: ('Not Implemented',
+              'Server does not support this operation'),
+        502: ('Bad Gateway', 'Invalid responses from another server/proxy.'),
+        503: ('Service Unavailable',
+              'The server cannot process the request due to a high load'),
+        504: ('Gateway Timeout',
+              'The gateway server did not receive a timely response'),
+        505: ('HTTP Version Not Supported', 'Cannot fulfill request.'),
+        }
+
+When an error is raised the server responds by returning an HTTP error code
+*and* an error page. You can use the ``HTTPError`` instance as a response on the
+page returned. This means that as well as the code attribute, it also has read,
+geturl, and info, methods. ::
+
+    >>> req = urllib2.Request('http://www.python.org/fish.html')
+    >>> try: 
+    >>>     urllib2.urlopen(req)
+    >>> except URLError, e:
+    >>>     print e.code
+    >>>     print e.read()
+    >>> 
+    404
+    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
+        "http://www.w3.org/TR/html4/loose.dtd">
+    <?xml-stylesheet href="./css/ht2html.css" 
+        type="text/css"?>
+    <html><head><title>Error 404: File Not Found</title> 
+    ...... etc...
+
+Wrapping it Up
+--------------
+
+So if you want to be prepared for ``HTTPError`` *or* ``URLError`` there are two
+basic approaches. I prefer the second approach.
+
+Number 1
+~~~~~~~~
+
+::
+
+
+    from urllib2 import Request, urlopen, URLError, HTTPError
+    req = Request(someurl)
+    try:
+        response = urlopen(req)
+    except HTTPError, e:
+        print 'The server couldn\'t fulfill the request.'
+        print 'Error code: ', e.code
+    except URLError, e:
+        print 'We failed to reach a server.'
+        print 'Reason: ', e.reason
+    else:
+        # everything is fine
+
+
+.. note::
+
+    The ``except HTTPError`` *must* come first, otherwise ``except URLError``
+    will *also* catch an ``HTTPError``.
+
+Number 2
+~~~~~~~~
+
+::
+
+    from urllib2 import Request, urlopen, URLError
+    req = Request(someurl)
+    try:
+        response = urlopen(req)
+    except URLError, e:
+        if hasattr(e, 'reason'):
+            print 'We failed to reach a server.'
+            print 'Reason: ', e.reason
+        elif hasattr(e, 'code'):
+            print 'The server couldn\'t fulfill the request.'
+            print 'Error code: ', e.code
+    else:
+        # everything is fine
+        
+
+info and geturl
+===============
+
+The response returned by urlopen (or the ``HTTPError`` instance) has two useful
+methods ``info`` and ``geturl``.
+
+**geturl** - this returns the real URL of the page fetched. This is useful
+because ``urlopen`` (or the opener object used) may have followed a
+redirect. The URL of the page fetched may not be the same as the URL requested.
+
+**info** - this returns a dictionary-like object that describes the page
+fetched, particularly the headers sent by the server. It is currently an
+``httplib.HTTPMessage`` instance.
+
+Typical headers include 'Content-length', 'Content-type', and so on. See the
+`Quick Reference to HTTP Headers <http://www.cs.tut.fi/~jkorpela/http.html>`_
+for a useful listing of HTTP headers with brief explanations of their meaning
+and use.
+
+
+Openers and Handlers
+====================
+
+When you fetch a URL you use an opener (an instance of the perhaps
+confusingly-named :class:`urllib2.OpenerDirector`). Normally we have been using
+the default opener - via ``urlopen`` - but you can create custom
+openers. Openers use handlers. All the "heavy lifting" is done by the
+handlers. Each handler knows how to open URLs for a particular URL scheme (http,
+ftp, etc.), or how to handle an aspect of URL opening, for example HTTP
+redirections or HTTP cookies.
+
+You will want to create openers if you want to fetch URLs with specific handlers
+installed, for example to get an opener that handles cookies, or to get an
+opener that does not handle redirections.
+
+To create an opener, instantiate an ``OpenerDirector``, and then call
+``.add_handler(some_handler_instance)`` repeatedly.
+
+Alternatively, you can use ``build_opener``, which is a convenience function for
+creating opener objects with a single function call.  ``build_opener`` adds
+several handlers by default, but provides a quick way to add more and/or
+override the default handlers.
+
+Other sorts of handlers you might want to can handle proxies, authentication,
+and other common but slightly specialised situations.
+
+``install_opener`` can be used to make an ``opener`` object the (global) default
+opener. This means that calls to ``urlopen`` will use the opener you have
+installed.
+
+Opener objects have an ``open`` method, which can be called directly to fetch
+urls in the same way as the ``urlopen`` function: there's no need to call
+``install_opener``, except as a convenience.
+
+
+Basic Authentication
+====================
+
+To illustrate creating and installing a handler we will use the
+``HTTPBasicAuthHandler``. For a more detailed discussion of this subject --
+including an explanation of how Basic Authentication works - see the `Basic
+Authentication Tutorial
+<http://www.voidspace.org.uk/python/articles/authentication.shtml>`_.
+
+When authentication is required, the server sends a header (as well as the 401
+error code) requesting authentication.  This specifies the authentication scheme
+and a 'realm'. The header looks like : ``Www-authenticate: SCHEME
+realm="REALM"``.
+
+e.g. :: 
+
+    Www-authenticate: Basic realm="cPanel Users"
+
+
+The client should then retry the request with the appropriate name and password
+for the realm included as a header in the request. This is 'basic
+authentication'. In order to simplify this process we can create an instance of
+``HTTPBasicAuthHandler`` and an opener to use this handler.
+
+The ``HTTPBasicAuthHandler`` uses an object called a password manager to handle
+the mapping of URLs and realms to passwords and usernames. If you know what the
+realm is (from the authentication header sent by the server), then you can use a
+``HTTPPasswordMgr``. Frequently one doesn't care what the realm is. In that
+case, it is convenient to use ``HTTPPasswordMgrWithDefaultRealm``. This allows
+you to specify a default username and password for a URL. This will be supplied
+in the absence of you providing an alternative combination for a specific
+realm. We indicate this by providing ``None`` as the realm argument to the
+``add_password`` method.
+
+The top-level URL is the first URL that requires authentication. URLs "deeper"
+than the URL you pass to .add_password() will also match. ::
+
+    # create a password manager
+    password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()                        
+
+    # Add the username and password.
+    # If we knew the realm, we could use it instead of ``None``.
+    top_level_url = "http://example.com/foo/"
+    password_mgr.add_password(None, top_level_url, username, password)
+
+    handler = urllib2.HTTPBasicAuthHandler(password_mgr)                            
+
+    # create "opener" (OpenerDirector instance)
+    opener = urllib2.build_opener(handler)                       
+
+    # use the opener to fetch a URL
+    opener.open(a_url)      
+
+    # Install the opener.
+    # Now all calls to urllib2.urlopen use our opener.
+    urllib2.install_opener(opener)                               
+
+.. note::
+
+    In the above example we only supplied our ``HHTPBasicAuthHandler`` to
+    ``build_opener``. By default openers have the handlers for normal situations
+    -- ``ProxyHandler``, ``UnknownHandler``, ``HTTPHandler``,
+    ``HTTPDefaultErrorHandler``, ``HTTPRedirectHandler``, ``FTPHandler``,
+    ``FileHandler``, ``HTTPErrorProcessor``.
+
+``top_level_url`` is in fact *either* a full URL (including the 'http:' scheme
+component and the hostname and optionally the port number)
+e.g. "http://example.com/" *or* an "authority" (i.e. the hostname,
+optionally including the port number) e.g. "example.com" or "example.com:8080"
+(the latter example includes a port number).  The authority, if present, must
+NOT contain the "userinfo" component - for example "joe@password:example.com" is
+not correct.
+
+
+Proxies
+=======
+
+**urllib2** will auto-detect your proxy settings and use those. This is through
+the ``ProxyHandler`` which is part of the normal handler chain. Normally that's
+a good thing, but there are occasions when it may not be helpful [#]_. One way
+to do this is to setup our own ``ProxyHandler``, with no proxies defined. This
+is done using similar steps to setting up a `Basic Authentication`_ handler : ::
+
+    >>> proxy_support = urllib2.ProxyHandler({})
+    >>> opener = urllib2.build_opener(proxy_support)
+    >>> urllib2.install_opener(opener)
+
+.. note::
+
+    Currently ``urllib2`` *does not* support fetching of ``https`` locations
+    through a proxy.  However, this can be enabled by extending urllib2 as
+    shown in the recipe [#]_.
+
+
+Sockets and Layers
+==================
+
+The Python support for fetching resources from the web is layered. urllib2 uses
+the httplib library, which in turn uses the socket library.
+
+As of Python 2.3 you can specify how long a socket should wait for a response
+before timing out. This can be useful in applications which have to fetch web
+pages. By default the socket module has *no timeout* and can hang. Currently,
+the socket timeout is not exposed at the httplib or urllib2 levels.  However,
+you can set the default timeout globally for all sockets using ::
+
+    import socket
+    import urllib2
+
+    # timeout in seconds
+    timeout = 10
+    socket.setdefaulttimeout(timeout) 
+
+    # this call to urllib2.urlopen now uses the default timeout
+    # we have set in the socket module
+    req = urllib2.Request('http://www.voidspace.org.uk')
+    response = urllib2.urlopen(req)
+
+
+-------
+
+
+Footnotes
+=========
+
+This document was reviewed and revised by John Lee.
+
+.. [#] For an introduction to the CGI protocol see
+       `Writing Web Applications in Python <http://www.pyzine.com/Issue008/Section_Articles/article_CGIOne.html>`_. 
+.. [#] Like Google for example. The *proper* way to use google from a program
+       is to use `PyGoogle <http://pygoogle.sourceforge.net>`_ of course. See
+       `Voidspace Google <http://www.voidspace.org.uk/python/recipebook.shtml#google>`_
+       for some examples of using the Google API.
+.. [#] Browser sniffing is a very bad practise for website design - building
+       sites using web standards is much more sensible. Unfortunately a lot of
+       sites still send different versions to different browsers.
+.. [#] The user agent for MSIE 6 is
+       *'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)'*
+.. [#] For details of more HTTP request headers, see
+       `Quick Reference to HTTP Headers`_.
+.. [#] In my case I have to use a proxy to access the internet at work. If you
+       attempt to fetch *localhost* URLs through this proxy it blocks them. IE
+       is set to use the proxy, which urllib2 picks up on. In order to test
+       scripts with a localhost server, I have to prevent urllib2 from using
+       the proxy.
+.. [#] urllib2 opener for SSL proxy (CONNECT method): `ASPN Cookbook Recipe 
+       <http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/456195>`_.
+