Commit the howto source to the main Python repository, with Fred's approval

2025-07-24 11:44:31 +00:00 · 2005-08-30 01:25:05 +00:00 · 2005-08-30 01:25:05 +00:00 · e8f44d683e
commit e8f44d683e
parent f1b2ba6aa1
9 changed files with 4340 additions and 0 deletions
--- a/Doc/howto/Makefile
+++ b/Doc/howto/Makefile
@ -0,0 +1,88 @@
+
+MKHOWTO=../tools/mkhowto
+WEBDIR=.
+RSTARGS = --input-encoding=utf-8
+VPATH=.:dvi:pdf:ps:txt
+
+# List of HOWTOs that aren't to be processed
+
+REMOVE_HOWTO =
+
+# Determine list of files to be built
+
+HOWTO=$(filter-out $(REMOVE_HOWTO),$(wildcard *.tex))
+RST_SOURCES =	$(shell echo *.rst)
+DVI  =$(patsubst %.tex,%.dvi,$(HOWTO))
+PDF  =$(patsubst %.tex,%.pdf,$(HOWTO))
+PS   =$(patsubst %.tex,%.ps,$(HOWTO))
+TXT  =$(patsubst %.tex,%.txt,$(HOWTO))
+HTML =$(patsubst %.tex,%,$(HOWTO))
+
+# Rules for building various formats
+%.dvi : %.tex
+	$(MKHOWTO) --dvi $<
+	mv $@ dvi
+
+%.pdf : %.tex
+	$(MKHOWTO) --pdf $<
+	mv $@ pdf
+
+%.ps : %.tex
+	$(MKHOWTO) --ps $<
+	mv $@ ps
+
+%.txt : %.tex
+	$(MKHOWTO) --text $<
+	mv $@ txt
+
+% : %.tex
+	$(MKHOWTO) --html --iconserver="." $<
+	tar -zcvf html/$*.tgz $*
+	#zip -r html/$*.zip $*
+
+default:
+	@echo "'all'    -- build all files"
+	@echo "'dvi', 'pdf', 'ps', 'txt', 'html' -- build one format"
+
+all: $(HTML)
+
+.PHONY : dvi pdf ps txt html rst
+dvi: $(DVI)
+
+pdf: $(PDF)
+ps:  $(PS)
+txt: $(TXT)
+html:$(HTML)
+
+# Rule to build collected tar files
+dist: #all
+	for i in dvi pdf ps txt ; do \
+	    cd $$i ; \
+	    tar -zcf All.tgz *.$$i ;\
+	    cd .. ;\
+	done
+
+# Rule to copy files to the Web tree on AMK's machine
+web: dist
+	cp dvi/* $(WEBDIR)/dvi
+	cp ps/* $(WEBDIR)/ps
+	cp pdf/* $(WEBDIR)/pdf
+	cp txt/* $(WEBDIR)/txt
+	for dir in $(HTML) ; do cp -rp $$dir $(WEBDIR) ; done
+	for ltx in $(HOWTO) ; do cp -p $$ltx $(WEBDIR)/latex ; done
+
+rst: unicode.html
+
+%.html: %.rst
+	rst2html $(RSTARGS) $< >$@
+
+clean:
+	rm -f *~ *.log *.ind *.l2h *.aux *.toc *.how
+	rm -f *.dvi *.ps *.pdf *.bkm
+	rm -f unicode.html
+
+clobber:
+	rm dvi/* ps/* pdf/* txt/* html/*
+
+
+
--- a/Doc/howto/advocacy.tex
+++ b/Doc/howto/advocacy.tex
@ -0,0 +1,405 @@
+
+\documentclass{howto}
+
+\title{Python Advocacy HOWTO}
+
+\release{0.03}
+
+\author{A.M. Kuchling}
+\authoraddress{\email{amk@amk.ca}}
+
+\begin{document}
+\maketitle
+
+\begin{abstract}
+\noindent
+It's usually difficult to get your management to accept open source
+software, and Python is no exception to this rule.  This document
+discusses reasons to use Python, strategies for winning acceptance,
+facts and arguments you can use, and cases where you \emph{shouldn't}
+try to use Python.
+
+This document is available from the Python HOWTO page at
+\url{http://www.python.org/doc/howto}.
+
+\end{abstract}
+
+\tableofcontents
+
+\section{Reasons to Use Python}
+
+There are several reasons to incorporate a scripting language into
+your development process, and this section will discuss them, and why
+Python has some properties that make it a particularly good choice.
+
+ \subsection{Programmability}
+
+Programs are often organized in a modular fashion.  Lower-level
+operations are grouped together, and called by higher-level functions,
+which may in turn be used as basic operations by still further upper
+levels.  
+
+For example, the lowest level might define a very low-level
+set of functions for accessing a hash table.  The next level might use
+hash tables to store the headers of a mail message, mapping a header
+name like \samp{Date} to a value such as \samp{Tue, 13 May 1997
+20:00:54 -0400}.  A yet higher level may operate on message objects,
+without knowing or caring that message headers are stored in a hash
+table, and so forth.  
+
+Often, the lowest levels do very simple things; they implement a data
+structure such as a binary tree or hash table, or they perform some
+simple computation, such as converting a date string to a number.  The
+higher levels then contain logic connecting these primitive
+operations.  Using the approach, the primitives can be seen as basic
+building blocks which are then glued together to produce the complete
+product.  
+
+Why is this design approach relevant to Python?  Because Python is
+well suited to functioning as such a glue language.  A common approach
+is to write a Python module that implements the lower level
+operations; for the sake of speed, the implementation might be in C,
+Java, or even Fortran.  Once the primitives are available to Python
+programs, the logic underlying higher level operations is written in
+the form of Python code.  The high-level logic is then more
+understandable, and easier to modify.
+
+John Ousterhout wrote a paper that explains this idea at greater
+length, entitled ``Scripting: Higher Level Programming for the 21st
+Century''.  I recommend that you read this paper; see the references
+for the URL.  Ousterhout is the inventor of the Tcl language, and
+therefore argues that Tcl should be used for this purpose; he only
+briefly refers to other languages such as Python, Perl, and
+Lisp/Scheme, but in reality, Ousterhout's argument applies to
+scripting languages in general, since you could equally write
+extensions for any of the languages mentioned above.
+
+ \subsection{Prototyping}
+
+In \emph{The Mythical Man-Month}, Fredrick Brooks suggests the
+following rule when planning software projects: ``Plan to throw one
+away; you will anyway.''  Brooks is saying that the first attempt at a
+software design often turns out to be wrong; unless the problem is
+very simple or you're an extremely good designer, you'll find that new
+requirements and features become apparent once development has
+actually started.  If these new requirements can't be cleanly
+incorporated into the program's structure, you're presented with two
+unpleasant choices: hammer the new features into the program somehow,
+or scrap everything and write a new version of the program, taking the
+new features into account from the beginning.
+
+Python provides you with a good environment for quickly developing an
+initial prototype.  That lets you get the overall program structure
+and logic right, and you can fine-tune small details in the fast
+development cycle that Python provides.  Once you're satisfied with
+the GUI interface or program output, you can translate the Python code
+into C++, Fortran, Java, or some other compiled language.
+
+Prototyping means you have to be careful not to use too many Python
+features that are hard to implement in your other language.  Using
+\code{eval()}, or regular expressions, or the \module{pickle} module,
+means that you're going to need C or Java libraries for formula
+evaluation, regular expressions, and serialization, for example.  But
+it's not hard to avoid such tricky code, and in the end the
+translation usually isn't very difficult.  The resulting code can be
+rapidly debugged, because any serious logical errors will have been
+removed from the prototype, leaving only more minor slip-ups in the
+translation to track down.  
+
+This strategy builds on the earlier discussion of programmability.
+Using Python as glue to connect lower-level components has obvious
+relevance for constructing prototype systems.  In this way Python can
+help you with development, even if end users never come in contact
+with Python code at all.  If the performance of the Python version is
+adequate and corporate politics allow it, you may not need to do a
+translation into C or Java, but it can still be faster to develop a
+prototype and then translate it, instead of attempting to produce the
+final version immediately.
+
+One example of this development strategy is Microsoft Merchant Server.
+Version 1.0 was written in pure Python, by a company that subsequently
+was purchased by Microsoft.  Version 2.0 began to translate the code
+into \Cpp, shipping with some \Cpp code and some Python code.  Version
+3.0 didn't contain any Python at all; all the code had been translated
+into \Cpp.  Even though the product doesn't contain a Python
+interpreter, the Python language has still served a useful purpose by
+speeding up development.  
+
+This is a very common use for Python.  Past conference papers have
+also described this approach for developing high-level numerical
+algorithms; see David M. Beazley and Peter S. Lomdahl's paper
+``Feeding a Large-scale Physics Application to Python'' in the
+references for a good example.  If an algorithm's basic operations are
+things like "Take the inverse of this 4000x4000 matrix", and are
+implemented in some lower-level language, then Python has almost no
+additional performance cost; the extra time required for Python to
+evaluate an expression like \code{m.invert()} is dwarfed by the cost
+of the actual computation.  It's particularly good for applications
+where seemingly endless tweaking is required to get things right. GUI
+interfaces and Web sites are prime examples.
+
+The Python code is also shorter and faster to write (once you're
+familiar with Python), so it's easier to throw it away if you decide
+your approach was wrong; if you'd spent two weeks working on it
+instead of just two hours, you might waste time trying to patch up
+what you've got out of a natural reluctance to admit that those two
+weeks were wasted.  Truthfully, those two weeks haven't been wasted,
+since you've learnt something about the problem and the technology
+you're using to solve it, but it's human nature to view this as a
+failure of some sort.
+
+ \subsection{Simplicity and Ease of Understanding}
+
+Python is definitely \emph{not} a toy language that's only usable for
+small tasks.  The language features are general and powerful enough to
+enable it to be used for many different purposes.  It's useful at the
+small end, for 10- or 20-line scripts, but it also scales up to larger
+systems that contain thousands of lines of code.
+
+However, this expressiveness doesn't come at the cost of an obscure or
+tricky syntax.  While Python has some dark corners that can lead to
+obscure code, there are relatively few such corners, and proper design
+can isolate their use to only a few classes or modules.  It's
+certainly possible to write confusing code by using too many features
+with too little concern for clarity, but most Python code can look a
+lot like a slightly-formalized version of human-understandable
+pseudocode.
+
+In \emph{The New Hacker's Dictionary}, Eric S. Raymond gives the following
+definition for "compact":
+
+\begin{quotation}
+	Compact \emph{adj.}  Of a design, describes the valuable property
+	that it can all be apprehended at once in one's head. This
+	generally means the thing created from the design can be used
+	with greater facility and fewer errors than an equivalent tool
+	that is not compact. Compactness does not imply triviality or
+	lack of power; for example, C is compact and FORTRAN is not,
+	but C is more powerful than FORTRAN. Designs become
+	non-compact through accreting features and cruft that don't
+	merge cleanly into the overall design scheme (thus, some fans
+	of Classic C maintain that ANSI C is no longer compact).
+\end{quotation}
+
+(From \url{http://sagan.earthspace.net/jargon/jargon_18.html\#SEC25})
+
+In this sense of the word, Python is quite compact, because the
+language has just a few ideas, which are used in lots of places.  Take
+namespaces, for example.  Import a module with \code{import math}, and
+you create a new namespace called \samp{math}.  Classes are also
+namespaces that share many of the properties of modules, and have a
+few of their own; for example, you can create instances of a class.
+Instances?  They're yet another namespace.  Namespaces are currently
+implemented as Python dictionaries, so they have the same methods as
+the standard dictionary data type: .keys() returns all the keys, and
+so forth.
+
+This simplicity arises from Python's development history.  The
+language syntax derives from different sources; ABC, a relatively
+obscure teaching language, is one primary influence, and Modula-3 is
+another.  (For more information about ABC and Modula-3, consult their
+respective Web sites at \url{http://www.cwi.nl/~steven/abc/} and
+\url{http://www.m3.org}.)  Other features have come from C, Icon,
+Algol-68, and even Perl.  Python hasn't really innovated very much,
+but instead has tried to keep the language small and easy to learn,
+building on ideas that have been tried in other languages and found
+useful.
+
+Simplicity is a virtue that should not be underestimated.  It lets you
+learn the language more quickly, and then rapidly write code, code
+that often works the first time you run it.
+
+ \subsection{Java Integration}
+
+If you're working with Java, Jython
+(\url{http://www.jython.org/}) is definitely worth your
+attention.  Jython is a re-implementation of Python in Java that
+compiles Python code into Java bytecodes.  The resulting environment
+has very tight, almost seamless, integration with Java.  It's trivial
+to access Java classes from Python, and you can write Python classes
+that subclass Java classes.  Jython can be used for prototyping Java
+applications in much the same way CPython is used, and it can also be
+used for test suites for Java code, or embedded in a Java application
+to add scripting capabilities.  
+
+\section{Arguments and Rebuttals}
+
+Let's say that you've decided upon Python as the best choice for your
+application.  How can you convince your management, or your fellow
+developers, to use Python?  This section lists some common arguments
+against using Python, and provides some possible rebuttals.
+
+\emph{Python is freely available software that doesn't cost anything.
+How good can it be?}
+
+Very good, indeed.  These days Linux and Apache, two other pieces of
+open source software, are becoming more respected as alternatives to
+commercial software, but Python hasn't had all the publicity.
+
+Python has been around for several years, with many users and
+developers.  Accordingly, the interpreter has been used by many
+people, and has gotten most of the bugs shaken out of it.  While bugs
+are still discovered at intervals, they're usually either quite
+obscure (they'd have to be, for no one to have run into them before)
+or they involve interfaces to external libraries.  The internals of
+the language itself are quite stable.
+
+Having the source code should be viewed as making the software
+available for peer review; people can examine the code, suggest (and
+implement) improvements, and track down bugs.  To find out more about
+the idea of open source code, along with arguments and case studies
+supporting it, go to \url{http://www.opensource.org}.
+
+\emph{Who's going to support it?}
+
+Python has a sizable community of developers, and the number is still
+growing.  The Internet community surrounding the language is an active
+one, and is worth being considered another one of Python's advantages.
+Most questions posted to the comp.lang.python newsgroup are quickly
+answered by someone.
+
+Should you need to dig into the source code, you'll find it's clear
+and well-organized, so it's not very difficult to write extensions and
+track down bugs yourself.  If you'd prefer to pay for support, there
+are companies and individuals who offer commercial support for Python.
+
+\emph{Who uses Python for serious work?}
+
+Lots of people; one interesting thing about Python is the surprising
+diversity of applications that it's been used for.  People are using
+Python to:
+
+\begin{itemize}
+\item Run Web sites
+\item Write GUI interfaces
+\item Control
+number-crunching code on supercomputers
+\item Make a commercial application scriptable by embedding the Python
+interpreter inside it
+\item Process large XML data sets
+\item Build test suites for C or Java code
+\end{itemize}
+
+Whatever your application domain is, there's probably someone who's
+used Python for something similar.  Yet, despite being useable for
+such high-end applications, Python's still simple enough to use for
+little jobs.
+
+See \url{http://www.python.org/psa/Users.html} for a list of some of the 
+organizations that use Python.
+
+\emph{What are the restrictions on Python's use?}
+
+They're practically nonexistent.  Consult the \file{Misc/COPYRIGHT}
+file in the source distribution, or
+\url{http://www.python.org/doc/Copyright.html} for the full language,
+but it boils down to three conditions.
+
+\begin{itemize}
+
+\item You have to leave the copyright notice on the software; if you
+don't include the source code in a product, you have to put the
+copyright notice in the supporting documentation.  
+
+\item Don't claim that the institutions that have developed Python
+endorse your product in any way.
+
+\item If something goes wrong, you can't sue for damages.  Practically
+all software licences contain this condition.
+
+\end{itemize}
+
+Notice that you don't have to provide source code for anything that
+contains Python or is built with it.  Also, the Python interpreter and
+accompanying documentation can be modified and redistributed in any
+way you like, and you don't have to pay anyone any licensing fees at
+all.
+
+\emph{Why should we use an obscure language like Python instead of
+well-known language X?}
+
+I hope this HOWTO, and the documents listed in the final section, will
+help convince you that Python isn't obscure, and has a healthily
+growing user base.  One word of advice: always present Python's
+positive advantages, instead of concentrating on language X's
+failings.  People want to know why a solution is good, rather than why
+all the other solutions are bad.  So instead of attacking a competing
+solution on various grounds, simply show how Python's virtues can
+help.
+
+
+\section{Useful Resources}
+
+\begin{definitions}
+
+\term{\url{http://www.fsbassociates.com/books/pythonchpt1.htm}}
+
+The first chapter of \emph{Internet Programming with Python} also
+examines some of the reasons for using Python.  The book is well worth
+buying, but the publishers have made the first chapter available on
+the Web.
+
+\term{\url{http://home.pacbell.net/ouster/scripting.html}}
+ 
+John Ousterhout's white paper on scripting is a good argument for the
+utility of scripting languages, though naturally enough, he emphasizes
+Tcl, the language he developed.  Most of the arguments would apply to
+any scripting language.
+
+\term{\url{http://www.python.org/workshops/1997-10/proceedings/beazley.html}}
+
+The authors, David M. Beazley and Peter S. Lomdahl, 
+describe their use of Python at Los Alamos National Laboratory.
+It's another good example of how Python can help get real work done.
+This quotation from the paper has been echoed by many people:
+
+\begin{quotation}
+       Originally developed as a large monolithic application for
+       massively parallel processing systems, we have used Python to
+       transform our application into a flexible, highly modular, and
+       extremely powerful system for performing simulation, data
+       analysis, and visualization. In addition, we describe how Python
+       has solved a number of important problems related to the
+       development, debugging, deployment, and maintenance of scientific
+       software.
+\end{quotation}
+
+%\term{\url{http://www.pythonjournal.com/volume1/art-interview/}}
+ 
+%This interview with Andy Feit, discussing Infoseek's use of Python, can be
+%used to show that choosing Python didn't introduce any difficulties
+%into a company's development process, and provided some substantial benefits.
+
+\term{\url{http://www.python.org/psa/Commercial.html}} 
+
+Robin Friedrich wrote this document on how to support Python's use in
+commercial projects.
+
+\term{\url{http://www.python.org/workshops/1997-10/proceedings/stein.ps}}
+
+For the 6th Python conference, Greg Stein presented a paper that
+traced Python's adoption and usage at a startup called eShop, and
+later at Microsoft.
+
+\term{\url{http://www.opensource.org}} 
+
+Management may be doubtful of the reliability and usefulness of
+software that wasn't written commercially.  This site presents
+arguments that show how open source software can have considerable
+advantages over closed-source software.
+
+\term{\url{http://sunsite.unc.edu/LDP/HOWTO/mini/Advocacy.html}}
+
+The Linux Advocacy mini-HOWTO was the inspiration for this document,
+and is also well worth reading for general suggestions on winning
+acceptance for a new technology, such as Linux or Python.  In general,
+you won't make much progress by simply attacking existing systems and
+complaining about their inadequacies; this often ends up looking like
+unfocused whining.  It's much better to point out some of the many
+areas where Python is an improvement over other systems.  
+
+\end{definitions}
+
+\end{document}
+
+
--- a/Doc/howto/curses.tex
+++ b/Doc/howto/curses.tex
@ -0,0 +1,485 @@
+\documentclass{howto}
+
+\title{Curses Programming with Python}
+
+\release{2.01}
+
+\author{A.M. Kuchling, Eric S. Raymond}
+\authoraddress{\email{amk@amk.ca}, \email{esr@thyrsus.com}}
+
+\begin{document}
+\maketitle
+
+\begin{abstract}
+\noindent
+This document describes how to write text-mode programs with Python 2.x,
+using the \module{curses} extension module to control the display.   
+
+This document is available from the Python HOWTO page at
+\url{http://www.python.org/doc/howto}.
+\end{abstract}
+
+\tableofcontents
+
+\section{What is curses?}
+
+The curses library supplies a terminal-independent screen-painting and
+keyboard-handling facility for text-based terminals; such terminals
+include VT100s, the Linux console, and the simulated terminal provided
+by X11 programs such as xterm and rxvt.  Display terminals support
+various control codes to perform common operations such as moving the
+cursor, scrolling the screen, and erasing areas.  Different terminals
+use widely differing codes, and often have their own minor quirks.
+
+In a world of X displays, one might ask ``why bother''?  It's true
+that character-cell display terminals are an obsolete technology, but
+there are niches in which being able to do fancy things with them are
+still valuable.  One is on small-footprint or embedded Unixes that 
+don't carry an X server.  Another is for tools like OS installers
+and kernel configurators that may have to run before X is available.
+
+The curses library hides all the details of different terminals, and
+provides the programmer with an abstraction of a display, containing
+multiple non-overlapping windows.  The contents of a window can be
+changed in various ways--adding text, erasing it, changing its
+appearance--and the curses library will automagically figure out what
+control codes need to be sent to the terminal to produce the right
+output.
+
+The curses library was originally written for BSD Unix; the later System V
+versions of Unix from AT\&T added many enhancements and new functions.
+BSD curses is no longer maintained, having been replaced by ncurses,
+which is an open-source implementation of the AT\&T interface.  If you're
+using an open-source Unix such as Linux or FreeBSD, your system almost
+certainly uses ncurses.  Since most current commercial Unix versions
+are based on System V code, all the functions described here will
+probably be available.  The older versions of curses carried by some
+proprietary Unixes may not support everything, though.
+
+No one has made a Windows port of the curses module.  On a Windows
+platform, try the Console module written by Fredrik Lundh.  The
+Console module provides cursor-addressable text output, plus full
+support for mouse and keyboard input, and is available from
+\url{http://effbot.org/efflib/console}.
+
+\subsection{The Python curses module}
+
+Thy Python module is a fairly simple wrapper over the C functions
+provided by curses; if you're already familiar with curses programming
+in C, it's really easy to transfer that knowledge to Python.  The
+biggest difference is that the Python interface makes things simpler,
+by merging different C functions such as \function{addstr},
+\function{mvaddstr}, \function{mvwaddstr}, into a single
+\method{addstr()} method.  You'll see this covered in more detail
+later.
+
+This HOWTO is simply an introduction to writing text-mode programs
+with curses and Python. It doesn't attempt to be a complete guide to
+the curses API; for that, see the Python library guide's serction on
+ncurses, and the C manual pages for ncurses.  It will, however, give
+you the basic ideas.
+
+\section{Starting and ending a curses application}
+
+Before doing anything, curses must be initialized.  This is done by
+calling the \function{initscr()} function, which will determine the
+terminal type, send any required setup codes to the terminal, and
+create various internal data structures.  If successful,
+\function{initscr()} returns a window object representing the entire
+screen; this is usually called \code{stdscr}, after the name of the
+corresponding C
+variable.
+
+\begin{verbatim}
+import curses
+stdscr = curses.initscr()
+\end{verbatim}
+
+Usually curses applications turn off automatic echoing of keys to the
+screen, in order to be able to read keys and only display them under
+certain circumstances.  This requires calling the \function{noecho()}
+function.
+
+\begin{verbatim}
+curses.noecho()
+\end{verbatim}
+
+Applications will also commonly need to react to keys instantly,
+without requiring the Enter key to be pressed; this is called cbreak
+mode, as opposed to the usual buffered input mode.
+
+\begin{verbatim}
+curses.cbreak()
+\end{verbatim}
+
+Terminals usually return special keys, such as the cursor keys or
+navigation keys such as Page Up and Home, as a multibyte escape
+sequence.  While you could write your application to expect such
+sequences and process them accordingly, curses can do it for you,
+returning a special value such as \constant{curses.KEY_LEFT}.  To get
+curses to do the job, you'll have to enable keypad mode.
+
+\begin{verbatim}
+stdscr.keypad(1)
+\end{verbatim}
+
+Terminating a curses application is much easier than starting one.
+You'll need to call 
+
+\begin{verbatim}
+curses.nocbreak(); stdscr.keypad(0); curses.echo()
+\end{verbatim}
+
+to reverse the curses-friendly terminal settings. Then call the
+\function{endwin()} function to restore the terminal to its original
+operating mode.
+
+\begin{verbatim}
+curses.endwin()
+\end{verbatim}
+
+A common problem when debugging a curses application is to get your
+terminal messed up when the application dies without restoring the
+terminal to its previous state.  In Python this commonly happens when
+your code is buggy and raises an uncaught exception.  Keys are no
+longer be echoed to the screen when you type them, for example, which
+makes using the shell difficult.
+
+In Python you can avoid these complications and make debugging much
+easier by importing the module \module{curses.wrapper}.  It supplies a
+function \function{wrapper} that takes a hook argument.  It does the
+initializations described above, and also initializes colors if color
+support is present.  It then runs your hook, and then finally
+deinitializes appropriately.  The hook is called inside a try-catch
+clause which catches exceptions, performs curses deinitialization, and
+then passes the exception upwards.  Thus, your terminal won't be left
+in a funny state on exception.
+
+\section{Windows and Pads}
+
+Windows are the basic abstraction in curses.  A window object
+represents a rectangular area of the screen, and supports various
+ methods to display text, erase it, allow the user to input strings,
+and so forth.
+
+The \code{stdscr} object returned by the \function{initscr()} function
+is a window object that covers the entire screen.  Many programs may
+need only this single window, but you might wish to divide the screen
+into smaller windows, in order to redraw or clear them separately.
+The \function{newwin()} function creates a new window of a given size,
+returning the new window object.
+
+\begin{verbatim}
+begin_x = 20 ; begin_y = 7
+height = 5 ; width = 40
+win = curses.newwin(height, width, begin_y, begin_x)
+\end{verbatim}
+
+A word about the coordinate system used in curses: coordinates are
+always passed in the order \emph{y,x}, and the top-left corner of a
+window is coordinate (0,0).  This breaks a common convention for
+handling coordinates, where the \emph{x} coordinate usually comes
+first.  This is an unfortunate difference from most other computer
+applications, but it's been part of curses since it was first written,
+and it's too late to change things now.
+
+When you call a method to display or erase text, the effect doesn't
+immediately show up on the display.  This is because curses was
+originally written with slow 300-baud terminal connections in mind;
+with these terminals, minimizing the time required to redraw the
+screen is very important.  This lets curses accumulate changes to the
+screen, and display them in the most efficient manner.  For example,
+if your program displays some characters in a window, and then clears
+the window, there's no need to send the original characters because
+they'd never be visible.  
+
+Accordingly, curses requires that you explicitly tell it to redraw
+windows, using the \function{refresh()} method of window objects.  In
+practice, this doesn't really complicate programming with curses much.
+Most programs go into a flurry of activity, and then pause waiting for
+a keypress or some other action on the part of the user.  All you have
+to do is to be sure that the screen has been redrawn before pausing to
+wait for user input, by simply calling \code{stdscr.refresh()} or the
+\function{refresh()} method of some other relevant window.
+
+A pad is a special case of a window; it can be larger than the actual
+display screen, and only a portion of it displayed at a time.
+Creating a pad simply requires the pad's height and width, while
+refreshing a pad requires giving the coordinates of the on-screen
+area where a subsection of the pad will be displayed.  
+
+\begin{verbatim}
+pad = curses.newpad(100, 100)
+#  These loops fill the pad with letters; this is
+# explained in the next section
+for y in range(0, 100):
+    for x in range(0, 100):
+        try: pad.addch(y,x, ord('a') + (x*x+y*y) % 26 )
+        except curses.error: pass
+
+#  Displays a section of the pad in the middle of the screen
+pad.refresh( 0,0, 5,5, 20,75)
+\end{verbatim}
+
+The \function{refresh()} call displays a section of the pad in the
+rectangle extending from coordinate (5,5) to coordinate (20,75) on the
+screen;the upper left corner of the displayed section is coordinate
+(0,0) on the pad.  Beyond that difference, pads are exactly like
+ordinary windows and support the same methods.
+
+If you have multiple windows and pads on screen there is a more
+efficient way to go, which will prevent annoying screen flicker at
+refresh time.  Use the methods \method{noutrefresh()} and/or
+\method{noutrefresh()} of each window to update the data structure
+representing the desired state of the screen; then change the physical
+screen to match the desired state in one go with the function
+\function{doupdate()}.  The normal \method{refresh()} method calls
+\function{doupdate()} as its last act.
+
+\section{Displaying Text}
+
+{}From a C programmer's point of view, curses may sometimes look like
+a twisty maze of functions, all subtly different.  For example,
+\function{addstr()} displays a string at the current cursor location
+in the \code{stdscr} window, while \function{mvaddstr()} moves to a
+given y,x coordinate first before displaying the string.
+\function{waddstr()} is just like \function{addstr()}, but allows
+specifying a window to use, instead of using \code{stdscr} by default.
+\function{mvwaddstr()} follows similarly.
+
+Fortunately the Python interface hides all these details;
+\code{stdscr} is a window object like any other, and methods like
+\function{addstr()} accept multiple argument forms.  Usually there are
+four different forms.
+
+\begin{tableii}{|c|l|}{textrm}{Form}{Description}
+\lineii{\var{str} or \var{ch}}{Display the string \var{str} or
+character \var{ch}}
+\lineii{\var{str} or \var{ch}, \var{attr}}{Display the string \var{str} or
+character \var{ch}, using attribute \var{attr}}
+\lineii{\var{y}, \var{x}, \var{str} or \var{ch}}
+{Move to position \var{y,x} within the window, and display \var{str}
+or \var{ch}}
+\lineii{\var{y}, \var{x}, \var{str} or \var{ch}, \var{attr}}
+{Move to position \var{y,x} within the window, and display \var{str}
+or \var{ch}, using attribute \var{attr}}
+\end{tableii}
+
+Attributes allow displaying text in highlighted forms, such as in
+boldface, underline, reverse code, or in color.  They'll be explained
+in more detail in the next subsection.
+
+The \function{addstr()} function takes a Python string as the value to
+be displayed, while the \function{addch()} functions take a character,
+which can be either a Python string of length 1, or an integer.  If
+it's a string, you're limited to displaying characters between 0 and
+255.  SVr4 curses provides constants for extension characters; these
+constants are integers greater than 255.  For example,
+\constant{ACS_PLMINUS} is a +/- symbol, and \constant{ACS_ULCORNER} is
+the upper left corner of a box (handy for drawing borders).
+
+Windows remember where the cursor was left after the last operation,
+so if you leave out the \var{y,x} coordinates, the string or character
+will be displayed wherever the last operation left off.  You can also
+move the cursor with the \function{move(\var{y,x})} method.  Because
+some terminals always display a flashing cursor, you may want to
+ensure that the cursor is positioned in some location where it won't
+be distracting; it can be confusing to have the cursor blinking at
+some apparently random location.  
+
+If your application doesn't need a blinking cursor at all, you can
+call \function{curs_set(0)} to make it invisible.  Equivalently, and
+for compatibility with older curses versions, there's a
+\function{leaveok(\var{bool})} function.  When \var{bool} is true, the
+curses library will attempt to suppress the flashing cursor, and you
+won't need to worry about leaving it in odd locations.
+
+\subsection{Attributes and Color}
+
+Characters can be displayed in different ways.  Status lines in a
+text-based application are commonly shown in reverse video; a text
+viewer may need to highlight certain words.  curses supports this by
+allowing you to specify an attribute for each cell on the screen.
+
+An attribute is a integer, each bit representing a different
+attribute.  You can try to display text with multiple attribute bits
+set, but curses doesn't guarantee that all the possible combinations
+are available, or that they're all visually distinct.  That depends on
+the ability of the terminal being used, so it's safest to stick to the
+most commonly available attributes, listed here.
+
+\begin{tableii}{|c|l|}{constant}{Attribute}{Description}
+\lineii{A_BLINK}{Blinking text}
+\lineii{A_BOLD}{Extra bright or bold text}
+\lineii{A_DIM}{Half bright text}
+\lineii{A_REVERSE}{Reverse-video text}
+\lineii{A_STANDOUT}{The best highlighting mode available}
+\lineii{A_UNDERLINE}{Underlined text}
+\end{tableii}
+
+So, to display a reverse-video status line on the top line of the
+screen,
+you could code:
+
+\begin{verbatim}
+stdscr.addstr(0, 0, "Current mode: Typing mode",
+	      curses.A_REVERSE)
+stdscr.refresh()
+\end{verbatim}
+
+The curses library also supports color on those terminals that
+provide it, The most common such terminal is probably the Linux
+console, followed by color xterms.
+
+To use color, you must call the \function{start_color()} function
+soon after calling \function{initscr()}, to initialize the default
+color set (the \function{curses.wrapper.wrapper()} function does this
+automatically).  Once that's done, the \function{has_colors()}
+function returns TRUE if the terminal in use can actually display
+color.  (Note from AMK:  curses uses the American spelling
+'color', instead of the Canadian/British spelling 'colour'.  If you're
+like me, you'll have to resign yourself to misspelling it for the sake
+of these functions.)
+
+The curses library maintains a finite number of color pairs,
+containing a foreground (or text) color and a background color.  You
+can get the attribute value corresponding to a color pair with the
+\function{color_pair()} function; this can be bitwise-OR'ed with other
+attributes such as \constant{A_REVERSE}, but again, such combinations
+are not guaranteed to work on all terminals.
+
+An example, which displays a line of text using color pair 1:
+
+\begin{verbatim}
+stdscr.addstr( "Pretty text", curses.color_pair(1) )
+stdscr.refresh()
+\end{verbatim}
+
+As I said before, a color pair consists of a foreground and
+background color.  \function{start_color()} initializes 8 basic
+colors when it activates color mode.  They are: 0:black, 1:red,
+2:green, 3:yellow, 4:blue, 5:magenta, 6:cyan, and 7:white.  The curses
+module defines named constants for each of these colors:
+\constant{curses.COLOR_BLACK}, \constant{curses.COLOR_RED}, and so
+forth.
+
+The \function{init_pair(\var{n, f, b})} function changes the
+definition of color pair \var{n}, to foreground color {f} and
+background color {b}.  Color pair 0 is hard-wired to white on black,
+and cannot be changed.  
+
+Let's put all this together. To change color 1 to red
+text on a white background, you would call:
+
+\begin{verbatim}
+curses.init_pair(1, curses.COLOR_RED, curses.COLOR_WHITE)
+\end{verbatim}
+
+When you change a color pair, any text already displayed using that
+color pair will change to the new colors.  You can also display new
+text in this color with:
+
+\begin{verbatim}
+stdscr.addstr(0,0, "RED ALERT!", curses.color_pair(1) )
+\end{verbatim}
+
+Very fancy terminals can change the definitions of the actual colors
+to a given RGB value.  This lets you change color 1, which is usually
+red, to purple or blue or any other color you like.  Unfortunately,
+the Linux console doesn't support this, so I'm unable to try it out,
+and can't provide any examples.  You can check if your terminal can do
+this by calling \function{can_change_color()}, which returns TRUE if
+the capability is there.  If you're lucky enough to have such a
+talented terminal, consult your system's man pages for more
+information.
+
+\section{User Input}
+
+The curses library itself offers only very simple input mechanisms.
+Python's support adds a text-input widget that makes up some of the
+lack.
+
+The most common way to get input to a window is to use its
+\method{getch()} method. that pauses, and waits for the user to hit
+a key, displaying it if \function{echo()} has been called earlier.
+You can optionally specify a coordinate to which the cursor should be
+moved before pausing.
+
+It's possible to change this behavior with the method
+\method{nodelay()}. After \method{nodelay(1)}, \method{getch()} for
+the window becomes non-blocking and returns ERR (-1) when no input is
+ready.  There's also a \function{halfdelay()} function, which can be
+used to (in effect) set a timer on each \method{getch()}; if no input
+becomes available within the number of milliseconds specified as the
+argument to \function{halfdelay()}, curses throws an exception.
+
+The \method{getch()} method returns an integer; if it's between 0 and
+255, it represents the ASCII code of the key pressed.  Values greater
+than 255 are special keys such as Page Up, Home, or the cursor keys.
+You can compare the value returned to constants such as
+\constant{curses.KEY_PPAGE}, \constant{curses.KEY_HOME}, or
+\constant{curses.KEY_LEFT}.  Usually the main loop of your program
+will look something like this:
+
+\begin{verbatim}
+while 1:
+    c = stdscr.getch()
+    if c == ord('p'): PrintDocument()
+    elif c == ord('q'): break  # Exit the while()
+    elif c == curses.KEY_HOME: x = y = 0
+\end{verbatim}
+
+The \module{curses.ascii} module supplies ASCII class membership
+functions that take either integer or 1-character-string
+arguments; these may be useful in writing more readable tests for
+your command interpreters.  It also supplies conversion functions 
+that take either integer or 1-character-string arguments and return
+the same type.  For example, \function{curses.ascii.ctrl()} returns
+the control character corresponding to its argument.
+
+There's also a method to retrieve an entire string,
+\constant{getstr()}.  It isn't used very often, because its
+functionality is quite limited; the only editing keys available are
+the backspace key and the Enter key, which terminates the string.  It
+can optionally be limited to a fixed number of characters.
+
+\begin{verbatim}
+curses.echo()            # Enable echoing of characters
+
+# Get a 15-character string, with the cursor on the top line 
+s = stdscr.getstr(0,0, 15)  
+\end{verbatim}
+
+The Python \module{curses.textpad} module supplies something better.
+With it, you can turn a window into a text box that supports an
+Emacs-like set of keybindings.  Various methods of \class{Textbox}
+class support editing with input validation and gathering the edit
+results either with or without trailing spaces.   See the library
+documentation on \module{curses.textpad} for the details.
+
+\section{For More Information}
+
+This HOWTO didn't cover some advanced topics, such as screen-scraping
+or capturing mouse events from an xterm instance.  But the Python
+library page for the curses modules is now pretty complete.  You
+should browse it next.
+
+If you're in doubt about the detailed behavior of any of the ncurses
+entry points, consult the manual pages for your curses implementation,
+whether it's ncurses or a proprietary Unix vendor's.  The manual pages
+will document any quirks, and provide complete lists of all the
+functions, attributes, and \constant{ACS_*} characters available to
+you.
+
+Because the curses API is so large, some functions aren't supported in
+the Python interface, not because they're difficult to implement, but
+because no one has needed them yet.  Feel free to add them and then
+submit a patch.  Also, we don't yet have support for the menus or
+panels libraries associated with ncurses; feel free to add that.
+
+If you write an interesting little program, feel free to contribute it
+as another demo.  We can always use more of them!
+
+The ncurses FAQ: \url{http://dickey.his.com/ncurses/ncurses.faq.html}
+
+\end{document}
--- a/Doc/howto/doanddont.tex
+++ b/Doc/howto/doanddont.tex
@ -0,0 +1,343 @@
+\documentclass{howto}
+
+\title{Idioms and Anti-Idioms in Python}
+
+\release{0.00}
+
+\author{Moshe Zadka}
+\authoraddress{howto@zadka.site.co.il}
+
+\begin{document}
+\maketitle
+
+This document is placed in the public doman.
+
+\begin{abstract}
+\noindent
+This document can be considered a companion to the tutorial. It
+shows how to use Python, and even more importantly, how {\em not}
+to use Python. 
+\end{abstract}
+
+\tableofcontents
+
+\section{Language Constructs You Should Not Use}
+
+While Python has relatively few gotchas compared to other languages, it
+still has some constructs which are only useful in corner cases, or are
+plain dangerous. 
+
+\subsection{from module import *}
+
+\subsubsection{Inside Function Definitions}
+
+\code{from module import *} is {\em invalid} inside function definitions.
+While many versions of Python do no check for the invalidity, it does not
+make it more valid, no more then having a smart lawyer makes a man innocent.
+Do not use it like that ever. Even in versions where it was accepted, it made
+the function execution slower, because the compiler could not be certain
+which names are local and which are global. In Python 2.1 this construct
+causes warnings, and sometimes even errors.
+
+\subsubsection{At Module Level}
+
+While it is valid to use \code{from module import *} at module level it
+is usually a bad idea. For one, this loses an important property Python
+otherwise has --- you can know where each toplevel name is defined by
+a simple "search" function in your favourite editor. You also open yourself
+to trouble in the future, if some module grows additional functions or
+classes. 
+
+One of the most awful question asked on the newsgroup is why this code:
+
+\begin{verbatim}
+f = open("www")
+f.read()
+\end{verbatim}
+
+does not work. Of course, it works just fine (assuming you have a file
+called "www".) But it does not work if somewhere in the module, the
+statement \code{from os import *} is present. The \module{os} module
+has a function called \function{open()} which returns an integer. While
+it is very useful, shadowing builtins is one of its least useful properties.
+
+Remember, you can never know for sure what names a module exports, so either
+take what you need --- \code{from module import name1, name2}, or keep them in
+the module and access on a per-need basis --- 
+\code{import module;print module.name}.
+
+\subsubsection{When It Is Just Fine}
+
+There are situations in which \code{from module import *} is just fine:
+
+\begin{itemize}
+
+\item The interactive prompt. For example, \code{from math import *} makes
+      Python an amazing scientific calculator.
+
+\item When extending a module in C with a module in Python.
+
+\item When the module advertises itself as \code{from import *} safe.
+
+\end{itemize}
+
+\subsection{Unadorned \keyword{exec}, \function{execfile} and friends}
+
+The word ``unadorned'' refers to the use without an explicit dictionary,
+in which case those constructs evaluate code in the {\em current} environment.
+This is dangerous for the same reasons \code{from import *} is dangerous ---
+it might step over variables you are counting on and mess up things for
+the rest of your code. Simply do not do that.
+
+Bad examples:
+
+\begin{verbatim}
+>>> for name in sys.argv[1:]:
+>>>     exec "%s=1" % name
+>>> def func(s, **kw):
+>>>     for var, val in kw.items():
+>>>         exec "s.%s=val" % var  # invalid!
+>>> execfile("handler.py")
+>>> handle()
+\end{verbatim}
+
+Good examples:
+
+\begin{verbatim}
+>>> d = {}
+>>> for name in sys.argv[1:]:
+>>>     d[name] = 1
+>>> def func(s, **kw):
+>>>     for var, val in kw.items():
+>>>         setattr(s, var, val)
+>>> d={}
+>>> execfile("handle.py", d, d)
+>>> handle = d['handle']
+>>> handle()
+\end{verbatim}
+
+\subsection{from module import name1, name2}
+
+This is a ``don't'' which is much weaker then the previous ``don't''s
+but is still something you should not do if you don't have good reasons
+to do that. The reason it is usually bad idea is because you suddenly
+have an object which lives in two seperate namespaces. When the binding
+in one namespace changes, the binding in the other will not, so there
+will be a discrepancy between them. This happens when, for example,
+one module is reloaded, or changes the definition of a function at runtime. 
+
+Bad example:
+
+\begin{verbatim}
+# foo.py
+a = 1
+
+# bar.py
+from foo import a
+if something():
+    a = 2 # danger: foo.a != a 
+\end{verbatim}
+
+Good example:
+
+\begin{verbatim}
+# foo.py
+a = 1
+
+# bar.py
+import foo
+if something():
+    foo.a = 2
+\end{verbatim}
+
+\subsection{except:}
+
+Python has the \code{except:} clause, which catches all exceptions.
+Since {\em every} error in Python raises an exception, this makes many
+programming errors look like runtime problems, and hinders
+the debugging process.
+
+The following code shows a great example:
+
+\begin{verbatim}
+try:
+    foo = opne("file") # misspelled "open"
+except:
+    sys.exit("could not open file!")
+\end{verbatim}
+
+The second line triggers a \exception{NameError} which is caught by the
+except clause. The program will exit, and you will have no idea that
+this has nothing to do with the readability of \code{"file"}.
+
+The example above is better written
+
+\begin{verbatim}
+try:
+    foo = opne("file") # will be changed to "open" as soon as we run it
+except IOError:
+    sys.exit("could not open file")
+\end{verbatim}
+
+There are some situations in which the \code{except:} clause is useful:
+for example, in a framework when running callbacks, it is good not to
+let any callback disturb the framework.
+
+\section{Exceptions}
+
+Exceptions are a useful feature of Python. You should learn to raise
+them whenever something unexpected occurs, and catch them only where
+you can do something about them.
+
+The following is a very popular anti-idiom
+
+\begin{verbatim}
+def get_status(file):
+    if not os.path.exists(file):
+        print "file not found"
+        sys.exit(1)
+    return open(file).readline()
+\end{verbatim}
+
+Consider the case the file gets deleted between the time the call to 
+\function{os.path.exists} is made and the time \function{open} is called.
+That means the last line will throw an \exception{IOError}. The same would
+happen if \var{file} exists but has no read permission. Since testing this
+on a normal machine on existing and non-existing files make it seem bugless,
+that means in testing the results will seem fine, and the code will get
+shipped. Then an unhandled \exception{IOError} escapes to the user, who
+has to watch the ugly traceback.
+
+Here is a better way to do it.
+
+\begin{verbatim}
+def get_status(file):
+    try:
+        return open(file).readline()
+    except (IOError, OSError):
+        print "file not found"
+        sys.exit(1)
+\end{verbatim}
+
+In this version, *either* the file gets opened and the line is read
+(so it works even on flaky NFS or SMB connections), or the message
+is printed and the application aborted.
+
+Still, \function{get_status} makes too many assumptions --- that it
+will only be used in a short running script, and not, say, in a long
+running server. Sure, the caller could do something like
+
+\begin{verbatim}
+try:
+    status = get_status(log)
+except SystemExit:
+    status = None
+\end{verbatim}
+
+So, try to make as few \code{except} clauses in your code --- those will
+usually be a catch-all in the \function{main}, or inside calls which
+should always succeed.
+
+So, the best version is probably
+
+\begin{verbatim}
+def get_status(file):
+    return open(file).readline()
+\end{verbatim}
+
+The caller can deal with the exception if it wants (for example, if it 
+tries several files in a loop), or just let the exception filter upwards
+to {\em its} caller.
+
+The last version is not very good either --- due to implementation details,
+the file would not be closed when an exception is raised until the handler
+finishes, and perhaps not at all in non-C implementations (e.g., Jython).
+
+\begin{verbatim}
+def get_status(file):
+    fp = open(file)
+    try:
+        return fp.readline()
+    finally:
+        fp.close()
+\end{verbatim}
+
+\section{Using the Batteries}
+
+Every so often, people seem to be writing stuff in the Python library
+again, usually poorly. While the occasional module has a poor interface,
+it is usually much better to use the rich standard library and data
+types that come with Python then inventing your own.
+
+A useful module very few people know about is \module{os.path}. It 
+always has the correct path arithmetic for your operating system, and
+will usually be much better then whatever you come up with yourself.
+
+Compare:
+
+\begin{verbatim}
+# ugh!
+return dir+"/"+file
+# better
+return os.path.join(dir, file)
+\end{verbatim}
+
+More useful functions in \module{os.path}: \function{basename}, 
+\function{dirname} and \function{splitext}.
+
+There are also many useful builtin functions people seem not to be
+aware of for some reason: \function{min()} and \function{max()} can
+find the minimum/maximum of any sequence with comparable semantics,
+for example, yet many people write they own max/min. Another highly
+useful function is \function{reduce()}. Classical use of \function{reduce()}
+is something like
+
+\begin{verbatim}
+import sys, operator
+nums = map(float, sys.argv[1:])
+print reduce(operator.add, nums)/len(nums)
+\end{verbatim}
+
+This cute little script prints the average of all numbers given on the
+command line. The \function{reduce()} adds up all the numbers, and
+the rest is just some pre- and postprocessing.
+
+On the same note, note that \function{float()}, \function{int()} and
+\function{long()} all accept arguments of type string, and so are
+suited to parsing --- assuming you are ready to deal with the
+\exception{ValueError} they raise.
+
+\section{Using Backslash to Continue Statements}
+
+Since Python treats a newline as a statement terminator,
+and since statements are often more then is comfortable to put
+in one line, many people do:
+
+\begin{verbatim}
+if foo.bar()['first'][0] == baz.quux(1, 2)[5:9] and \
+   calculate_number(10, 20) != forbulate(500, 360):
+      pass
+\end{verbatim}
+
+You should realize that this is dangerous: a stray space after the
+\code{\\} would make this line wrong, and stray spaces are notoriously
+hard to see in editors. In this case, at least it would be a syntax
+error, but if the code was:
+
+\begin{verbatim}
+value = foo.bar()['first'][0]*baz.quux(1, 2)[5:9] \
+        + calculate_number(10, 20)*forbulate(500, 360)
+\end{verbatim}
+
+then it would just be subtly wrong.
+
+It is usually much better to use the implicit continuation inside parenthesis:
+
+This version is bulletproof:
+
+\begin{verbatim}
+value = (foo.bar()['first'][0]*baz.quux(1, 2)[5:9] 
+        + calculate_number(10, 20)*forbulate(500, 360))
+\end{verbatim}
+
+\end{document}
--- a/Doc/howto/regex.tex
+++ b/Doc/howto/regex.tex
--- a/Doc/howto/rexec.tex
+++ b/Doc/howto/rexec.tex
@ -0,0 +1,61 @@
+\documentclass{howto}
+
+\title{Restricted Execution HOWTO}
+
+\release{2.1}
+
+\author{A.M. Kuchling}
+\authoraddress{\email{amk@amk.ca}}
+
+\begin{document}
+
+\maketitle
+
+\begin{abstract}
+\noindent
+
+Python 2.2.2 and earlier provided a \module{rexec} module running
+untrusted code.  However, it's never been exhaustively audited for
+security and it hasn't been updated to take into account recent
+changes to Python such as new-style classes. Therefore, the
+\module{rexec} module should not be trusted.  To discourage use of 
+\module{rexec}, this HOWTO has been withdrawn.
+
+The \module{rexec} and \module{Bastion} modules have been disabled in
+the Python CVS tree, both on the trunk (which will eventually become
+Python 2.3alpha2 and later 2.3final) and on the release22-maint branch
+(which will become Python 2.2.3, if someone ever volunteers to issue
+2.2.3).
+
+For discussion of the problems with \module{rexec}, see the python-dev
+threads starting at the following URLs:
+\url{http://mail.python.org/pipermail/python-dev/2002-December/031160.html},
+and
+\url{http://mail.python.org/pipermail/python-dev/2003-January/031848.html}.
+
+\end{abstract}
+
+
+\section{Version History}
+
+Sep. 12, 1998: Minor revisions and added the reference to the Janus
+project.
+
+Feb. 26, 1998: First version.  Suggestions are welcome.
+
+Mar. 16, 1998: Made some revisions suggested by Jeff Rush.  Some minor
+changes and clarifications, and a sizable section on exceptions added.
+
+Oct. 4, 2000: Checked with Python 2.0.  Minor rewrites and fixes made.
+Version number increased to 2.0.
+
+Dec. 17, 2002: Withdrawn.
+
+Jan. 8, 2003: Mention that \module{rexec} will be disabled in Python 2.3,
+and added links to relevant python-dev threads.
+
+\end{document}
+
+
+
+
--- a/Doc/howto/sockets.tex
+++ b/Doc/howto/sockets.tex
@ -0,0 +1,460 @@
+\documentclass{howto}
+
+\title{Socket Programming HOWTO}
+
+\release{0.00}
+
+\author{Gordon McMillan}
+\authoraddress{\email{gmcm@hypernet.com}}
+
+\begin{document}
+\maketitle
+
+\begin{abstract}
+\noindent
+Sockets are used nearly everywhere, but are one of the most severely
+misunderstood technologies around. This is a 10,000 foot overview of
+sockets. It's not really a tutorial - you'll still have work to do in
+getting things operational. It doesn't cover the fine points (and there
+are a lot of them), but I hope it will give you enough background to
+begin using them decently.
+
+This document is available from the Python HOWTO page at
+\url{http://www.python.org/doc/howto}.
+
+\end{abstract}
+
+\tableofcontents
+
+\section{Sockets}
+
+Sockets are used nearly everywhere, but are one of the most severely
+misunderstood technologies around. This is a 10,000 foot overview of
+sockets. It's not really a tutorial - you'll still have work to do in
+getting things working. It doesn't cover the fine points (and there
+are a lot of them), but I hope it will give you enough background to
+begin using them decently.
+
+I'm only going to talk about INET sockets, but they account for at
+least 99\% of the sockets in use. And I'll only talk about STREAM
+sockets - unless you really know what you're doing (in which case this
+HOWTO isn't for you!), you'll get better behavior and performance from
+a STREAM socket than anything else. I will try to clear up the mystery
+of what a socket is, as well as some hints on how to work with
+blocking and non-blocking sockets. But I'll start by talking about
+blocking sockets. You'll need to know how they work before dealing
+with non-blocking sockets.
+
+Part of the trouble with understanding these things is that "socket"
+can mean a number of subtly different things, depending on context. So
+first, let's make a distinction between a "client" socket - an
+endpoint of a conversation, and a "server" socket, which is more like
+a switchboard operator. The client application (your browser, for
+example) uses "client" sockets exclusively; the web server it's
+talking to uses both "server" sockets and "client" sockets.
+
+
+\subsection{History}
+
+Of the various forms of IPC (\emph{Inter Process Communication}),
+sockets are by far the most popular.  On any given platform, there are
+likely to be other forms of IPC that are faster, but for
+cross-platform communication, sockets are about the only game in town.
+
+They were invented in Berkeley as part of the BSD flavor of Unix. They
+spread like wildfire with the Internet. With good reason --- the
+combination of sockets with INET makes talking to arbitrary machines
+around the world unbelievably easy (at least compared to other
+schemes).  
+
+\section{Creating a Socket}
+
+Roughly speaking, when you clicked on the link that brought you to
+this page, your browser did something like the following:
+
+\begin{verbatim}
+    #create an INET, STREAMing socket
+    s = socket.socket(
+        socket.AF_INET, socket.SOCK_STREAM)
+    #now connect to the web server on port 80 
+    # - the normal http port
+    s.connect(("www.mcmillan-inc.com", 80))
+\end{verbatim}
+
+When the \code{connect} completes, the socket \code{s} can
+now be used to send in a request for the text of this page. The same
+socket will read the reply, and then be destroyed. That's right -
+destroyed. Client sockets are normally only used for one exchange (or
+a small set of sequential exchanges).
+
+What happens in the web server is a bit more complex. First, the web
+server creates a "server socket".
+
+\begin{verbatim}
+    #create an INET, STREAMing socket
+    serversocket = socket.socket(
+        socket.AF_INET, socket.SOCK_STREAM)
+    #bind the socket to a public host, 
+    # and a well-known port
+    serversocket.bind((socket.gethostname(), 80))
+    #become a server socket
+    serversocket.listen(5)
+\end{verbatim}
+
+A couple things to notice: we used \code{socket.gethostname()}
+so that the socket would be visible to the outside world. If we had
+used \code{s.bind(('', 80))} or \code{s.bind(('localhost',
+80))} or \code{s.bind(('127.0.0.1', 80))} we would still
+have a "server" socket, but one that was only visible within the same
+machine.
+
+A second thing to note: low number ports are usually reserved for
+"well known" services (HTTP, SNMP etc). If you're playing around, use
+a nice high number (4 digits).
+
+Finally, the argument to \code{listen} tells the socket library that
+we want it to queue up as many as 5 connect requests (the normal max)
+before refusing outside connections. If the rest of the code is
+written properly, that should be plenty.
+
+OK, now we have a "server" socket, listening on port 80. Now we enter
+the mainloop of the web server:
+
+\begin{verbatim}
+    while 1:
+        #accept connections from outside
+        (clientsocket, address) = serversocket.accept()
+        #now do something with the clientsocket
+        #in this case, we'll pretend this is a threaded server
+        ct = client_thread(clientsocket)
+        ct.run()
+\end{verbatim}
+
+There's actually 3 general ways in which this loop could work -
+dispatching a thread to handle \code{clientsocket}, create a new
+process to handle \code{clientsocket}, or restructure this app
+to use non-blocking sockets, and mulitplex between our "server" socket
+and any active \code{clientsocket}s using
+\code{select}. More about that later. The important thing to
+understand now is this: this is \emph{all} a "server" socket
+does. It doesn't send any data. It doesn't receive any data. It just
+produces "client" sockets. Each \code{clientsocket} is created
+in response to some \emph{other} "client" socket doing a
+\code{connect()} to the host and port we're bound to. As soon as
+we've created that \code{clientsocket}, we go back to listening
+for more connections. The two "clients" are free to chat it up - they
+are using some dynamically allocated port which will be recycled when
+the conversation ends.
+
+\subsection{IPC} If you need fast IPC between two processes
+on one machine, you should look into whatever form of shared memory
+the platform offers. A simple protocol based around shared memory and
+locks or semaphores is by far the fastest technique.
+
+If you do decide to use sockets, bind the "server" socket to
+\code{'localhost'}. On most platforms, this will take a shortcut
+around a couple of layers of network code and be quite a bit faster.
+
+
+\section{Using a Socket}
+
+The first thing to note, is that the web browser's "client" socket and
+the web server's "client" socket are identical beasts. That is, this
+is a "peer to peer" conversation. Or to put it another way, \emph{as the
+designer, you will have to decide what the rules of etiquette are for
+a conversation}. Normally, the \code{connect}ing socket
+starts the conversation, by sending in a request, or perhaps a
+signon. But that's a design decision - it's not a rule of sockets.
+
+Now there are two sets of verbs to use for communication. You can use
+\code{send} and \code{recv}, or you can transform your
+client socket into a file-like beast and use \code{read} and
+\code{write}. The latter is the way Java presents their
+sockets. I'm not going to talk about it here, except to warn you that
+you need to use \code{flush} on sockets. These are buffered
+"files", and a common mistake is to \code{write} something, and
+then \code{read} for a reply. Without a \code{flush} in
+there, you may wait forever for the reply, because the request may
+still be in your output buffer.
+
+Now we come the major stumbling block of sockets - \code{send}
+and \code{recv} operate on the network buffers. They do not
+necessarily handle all the bytes you hand them (or expect from them),
+because their major focus is handling the network buffers. In general,
+they return when the associated network buffers have been filled
+(\code{send}) or emptied (\code{recv}). They then tell you
+how many bytes they handled. It is \emph{your} responsibility to call
+them again until your message has been completely dealt with.
+
+When a \code{recv} returns 0 bytes, it means the other side has
+closed (or is in the process of closing) the connection.  You will not
+receive any more data on this connection. Ever.  You may be able to
+send data successfully; I'll talk about that some on the next page.
+
+A protocol like HTTP uses a socket for only one transfer. The client
+sends a request, the reads a reply.  That's it. The socket is
+discarded. This means that a client can detect the end of the reply by
+receiving 0 bytes.
+
+But if you plan to reuse your socket for further transfers, you need
+to realize that \emph{there is no "EOT" (End of Transfer) on a
+socket.} I repeat: if a socket \code{send} or
+\code{recv} returns after handling 0 bytes, the connection has
+been broken.  If the connection has \emph{not} been broken, you may
+wait on a \code{recv} forever, because the socket will
+\emph{not} tell you that there's nothing more to read (for now).  Now
+if you think about that a bit, you'll come to realize a fundamental
+truth of sockets: \emph{messages must either be fixed length} (yuck),
+\emph{or be delimited} (shrug), \emph{or indicate how long they are}
+(much better), \emph{or end by shutting down the connection}. The
+choice is entirely yours, (but some ways are righter than others).
+
+Assuming you don't want to end the connection, the simplest solution
+is a fixed length message:
+
+\begin{verbatim}
+    class mysocket:
+        '''demonstration class only 
+          - coded for clarity, not efficiency'''
+        def __init__(self, sock=None):
+            if sock is None:
+                self.sock = socket.socket(
+                    socket.AF_INET, socket.SOCK_STREAM)
+            else:
+                self.sock = sock
+        def connect(host, port):
+            self.sock.connect((host, port))
+        def mysend(msg):
+            totalsent = 0
+            while totalsent < MSGLEN:
+                sent = self.sock.send(msg[totalsent:])
+                if sent == 0:
+                    raise RuntimeError, \\
+                        "socket connection broken"
+                totalsent = totalsent + sent
+        def myreceive():
+            msg = ''
+            while len(msg) < MSGLEN:
+                chunk = self.sock.recv(MSGLEN-len(msg))
+                if chunk == '':
+                    raise RuntimeError, \\
+                        "socket connection broken"
+                msg = msg + chunk
+            return msg
+\end{verbatim}
+
+The sending code here is usable for almost any messaging scheme - in
+Python you send strings, and you can use \code{len()} to
+determine its length (even if it has embedded \code{\e 0}
+characters). It's mostly the receiving code that gets more
+complex. (And in C, it's not much worse, except you can't use
+\code{strlen} if the message has embedded \code{\e 0}s.)
+
+The easiest enhancement is to make the first character of the message
+an indicator of message type, and have the type determine the
+length. Now you have two \code{recv}s - the first to get (at
+least) that first character so you can look up the length, and the
+second in a loop to get the rest. If you decide to go the delimited
+route, you'll be receiving in some arbitrary chunk size, (4096 or 8192
+is frequently a good match for network buffer sizes), and scanning
+what you've received for a delimiter.
+
+One complication to be aware of: if your conversational protocol
+allows multiple messages to be sent back to back (without some kind of
+reply), and you pass \code{recv} an arbitrary chunk size, you
+may end up reading the start of a following message. You'll need to
+put that aside and hold onto it, until it's needed.
+
+Prefixing the message with it's length (say, as 5 numeric characters)
+gets more complex, because (believe it or not), you may not get all 5
+characters in one \code{recv}. In playing around, you'll get
+away with it; but in high network loads, your code will very quickly
+break unless you use two \code{recv} loops - the first to
+determine the length, the second to get the data part of the
+message. Nasty. This is also when you'll discover that
+\code{send} does not always manage to get rid of everything in
+one pass. And despite having read this, you will eventually get bit by
+it!
+
+In the interests of space, building your character, (and preserving my
+competitive position), these enhancements are left as an exercise for
+the reader. Lets move on to cleaning up.
+
+\subsection{Binary Data}
+
+It is perfectly possible to send binary data over a socket. The major
+problem is that not all machines use the same formats for binary
+data. For example, a Motorola chip will represent a 16 bit integer
+with the value 1 as the two hex bytes 00 01. Intel and DEC, however,
+are byte-reversed - that same 1 is 01 00. Socket libraries have calls
+for converting 16 and 32 bit integers - \code{ntohl, htonl, ntohs,
+htons} where "n" means \emph{network} and "h" means \emph{host},
+"s" means \emph{short} and "l" means \emph{long}. Where network order
+is host order, these do nothing, but where the machine is
+byte-reversed, these swap the bytes around appropriately.
+
+In these days of 32 bit machines, the ascii representation of binary
+data is frequently smaller than the binary representation. That's
+because a surprising amount of the time, all those longs have the
+value 0, or maybe 1. The string "0" would be two bytes, while binary
+is four. Of course, this doesn't fit well with fixed-length
+messages. Decisions, decisions.
+
+\section{Disconnecting}
+
+Strictly speaking, you're supposed to use \code{shutdown} on a
+socket before you \code{close} it.  The \code{shutdown} is
+an advisory to the socket at the other end.  Depending on the argument
+you pass it, it can mean "I'm not going to send anymore, but I'll
+still listen", or "I'm not listening, good riddance!".  Most socket
+libraries, however, are so used to programmers neglecting to use this
+piece of etiquette that normally a \code{close} is the same as
+\code{shutdown(); close()}.  So in most situations, an explicit
+\code{shutdown} is not needed.
+
+One way to use \code{shutdown} effectively is in an HTTP-like
+exchange. The client sends a request and then does a
+\code{shutdown(1)}. This tells the server "This client is done
+sending, but can still receive."  The server can detect "EOF" by a
+receive of 0 bytes. It can assume it has the complete request.  The
+server sends a reply. If the \code{send} completes successfully
+then, indeed, the client was still receiving.
+
+Python takes the automatic shutdown a step further, and says that when a socket is garbage collected, it will automatically do a \code{close} if it's needed. But relying on this is a very bad habit. If your socket just disappears without doing a \code{close}, the socket at the other end may hang indefinitely, thinking you're just being slow. \emph{Please} \code{close} your sockets when you're done.
+
+
+\subsection{When Sockets Die}
+
+Probably the worst thing about using blocking sockets is what happens
+when the other side comes down hard (without doing a
+\code{close}). Your socket is likely to hang. SOCKSTREAM is a
+reliable protocol, and it will wait a long, long time before giving up
+on a connection. If you're using threads, the entire thread is
+essentially dead. There's not much you can do about it. As long as you
+aren't doing something dumb, like holding a lock while doing a
+blocking read, the thread isn't really consuming much in the way of
+resources. Do \emph{not} try to kill the thread - part of the reason
+that threads are more efficient than processes is that they avoid the
+overhead associated with the automatic recycling of resources. In
+other words, if you do manage to kill the thread, your whole process
+is likely to be screwed up.  
+
+\section{Non-blocking Sockets}
+
+If you've understood the preceeding, you already know most of what you
+need to know about the mechanics of using sockets. You'll still use
+the same calls, in much the same ways. It's just that, if you do it
+right, your app will be almost inside-out.
+
+In Python, you use \code{socket.setblocking(0)} to make it
+non-blocking. In C, it's more complex, (for one thing, you'll need to
+choose between the BSD flavor \code{O_NONBLOCK} and the almost
+indistinguishable Posix flavor \code{O_NDELAY}, which is
+completely different from \code{TCP_NODELAY}), but it's the
+exact same idea. You do this after creating the socket, but before
+using it. (Actually, if you're nuts, you can switch back and forth.)
+
+The major mechanical difference is that \code{send},
+\code{recv}, \code{connect} and \code{accept} can
+return without having done anything. You have (of course) a number of
+choices. You can check return code and error codes and generally drive
+yourself crazy. If you don't believe me, try it sometime. Your app
+will grow large, buggy and suck CPU. So let's skip the brain-dead
+solutions and do it right.
+
+Use \code{select}.
+
+In C, coding \code{select} is fairly complex. In Python, it's a
+piece of cake, but it's close enough to the C version that if you
+understand \code{select} in Python, you'll have little trouble
+with it in C.
+
+\begin{verbatim}    ready_to_read, ready_to_write, in_error = \\
+                   select.select(
+                      potential_readers, 
+                      potential_writers, 
+                      potential_errs, 
+                      timeout)
+\end{verbatim}
+
+You pass \code{select} three lists: the first contains all
+sockets that you might want to try reading; the second all the sockets
+you might want to try writing to, and the last (normally left empty)
+those that you want to check for errors.  You should note that a
+socket can go into more than one list. The \code{select} call is
+blocking, but you can give it a timeout. This is generally a sensible
+thing to do - give it a nice long timeout (say a minute) unless you
+have good reason to do otherwise.
+
+In return, you will get three lists. They have the sockets that are
+actually readable, writable and in error. Each of these lists is a
+subset (possbily empty) of the corresponding list you passed in. And
+if you put a socket in more than one input list, it will only be (at
+most) in one output list.
+
+If a socket is in the output readable list, you can be
+as-close-to-certain-as-we-ever-get-in-this-business that a
+\code{recv} on that socket will return \emph{something}. Same
+idea for the writable list. You'll be able to send
+\emph{something}. Maybe not all you want to, but \emph{something} is
+better than nothing. (Actually, any reasonably healthy socket will
+return as writable - it just means outbound network buffer space is
+available.)
+
+If you have a "server" socket, put it in the potential_readers
+list. If it comes out in the readable list, your \code{accept}
+will (almost certainly) work. If you have created a new socket to
+\code{connect} to someone else, put it in the ptoential_writers
+list. If it shows up in the writable list, you have a decent chance
+that it has connected.
+
+One very nasty problem with \code{select}: if somewhere in those
+input lists of sockets is one which has died a nasty death, the
+\code{select} will fail. You then need to loop through every
+single damn socket in all those lists and do a
+\code{select([sock],[],[],0)} until you find the bad one. That
+timeout of 0 means it won't take long, but it's ugly.
+
+Actually, \code{select} can be handy even with blocking sockets.
+It's one way of determining whether you will block - the socket
+returns as readable when there's something in the buffers.  However,
+this still doesn't help with the problem of determining whether the
+other end is done, or just busy with something else.
+
+\textbf{Portability alert}: On Unix, \code{select} works both with
+the sockets and files. Don't try this on Windows. On Windows,
+\code{select} works with sockets only. Also note that in C, many
+of the more advanced socket options are done differently on
+Windows. In fact, on Windows I usually use threads (which work very,
+very well) with my sockets. Face it, if you want any kind of
+performance, your code will look very different on Windows than on
+Unix. (I haven't the foggiest how you do this stuff on a Mac.)
+
+\subsection{Performance}
+
+There's no question that the fastest sockets code uses non-blocking
+sockets and select to multiplex them. You can put together something
+that will saturate a LAN connection without putting any strain on the
+CPU. The trouble is that an app written this way can't do much of
+anything else - it needs to be ready to shuffle bytes around at all
+times.
+
+Assuming that your app is actually supposed to do something more than
+that, threading is the optimal solution, (and using non-blocking
+sockets will be faster than using blocking sockets). Unfortunately,
+threading support in Unixes varies both in API and quality. So the
+normal Unix solution is to fork a subprocess to deal with each
+connection. The overhead for this is significant (and don't do this on
+Windows - the overhead of process creation is enormous there). It also
+means that unless each subprocess is completely independent, you'll
+need to use another form of IPC, say a pipe, or shared memory and
+semaphores, to communicate between the parent and child processes.
+
+Finally, remember that even though blocking sockets are somewhat
+slower than non-blocking, in many cases they are the "right"
+solution. After all, if your app is driven by the data it receives
+over a socket, there's not much sense in complicating the logic just
+so your app can wait on \code{select} instead of
+\code{recv}.
+
+\end{document}
--- a/Doc/howto/sorting.tex
+++ b/Doc/howto/sorting.tex
@ -0,0 +1,267 @@
+\documentclass{howto}
+
+\title{Sorting Mini-HOWTO}
+
+% Increment the release number whenever significant changes are made.
+% The author and/or editor can define 'significant' however they like.
+\release{0.01}
+
+\author{Andrew Dalke}
+\authoraddress{\email{dalke@bioreason.com}}
+
+\begin{document}
+\maketitle
+
+\begin{abstract}
+\noindent
+This document is a little tutorial
+showing a half dozen ways to sort a list with the built-in
+\method{sort()} method.  
+
+This document is available from the Python HOWTO page at
+\url{http://www.python.org/doc/howto}.
+\end{abstract}
+
+\tableofcontents
+
+Python lists have a built-in \method{sort()} method.  There are many
+ways to use it to sort a list and there doesn't appear to be a single,
+central place in the various manuals describing them, so I'll do so
+here.
+
+\section{Sorting basic data types}
+
+A simple ascending sort is easy; just call the \method{sort()} method of a list.
+
+\begin{verbatim}
+>>> a = [5, 2, 3, 1, 4]
+>>> a.sort()
+>>> print a
+[1, 2, 3, 4, 5]
+\end{verbatim}
+
+Sort takes an optional function which can be called for doing the
+comparisons.  The default sort routine is equivalent to
+
+\begin{verbatim}
+>>> a = [5, 2, 3, 1, 4]
+>>> a.sort(cmp)
+>>> print a
+[1, 2, 3, 4, 5]
+\end{verbatim}
+
+where \function{cmp} is the built-in function which compares two objects, \code{x} and
+\code{y}, and returns -1, 0 or 1 depending on whether $x<y$, $x==y$, or $x>y$.  During
+the course of the sort the relationships must stay the same for the
+final list to make sense.
+
+If you want, you can define your own function for the comparison.  For 
+integers (and numbers in general) we can do:
+
+\begin{verbatim}
+>>> def numeric_compare(x, y):
+>>>    return x-y
+>>> 
+>>> a = [5, 2, 3, 1, 4]
+>>> a.sort(numeric_compare)
+>>> print a
+[1, 2, 3, 4, 5]
+\end{verbatim}
+
+By the way, this function won't work if result of the subtraction
+is out of range, as in \code{sys.maxint - (-1)}.
+
+Or, if you don't want to define a new named function you can create an
+anonymous one using \keyword{lambda}, as in:
+
+\begin{verbatim}
+>>> a = [5, 2, 3, 1, 4]
+>>> a.sort(lambda x, y: x-y)
+>>> print a
+[1, 2, 3, 4, 5]
+\end{verbatim}
+
+If you want the numbers sorted in reverse you can do
+
+\begin{verbatim}
+>>> a = [5, 2, 3, 1, 4]
+>>> def reverse_numeric(x, y):
+>>>     return y-x
+>>> 
+>>> a.sort(reverse_numeric)
+>>> print a
+[5, 4, 3, 2, 1]
+\end{verbatim}
+
+(a more general implementation could return \code{cmp(y,x)} or \code{-cmp(x,y)}).
+
+However, it's faster if Python doesn't have to call a function for
+every comparison, so if you want a reverse-sorted list of basic data
+types, do the forward sort first, then use the \method{reverse()} method.
+
+\begin{verbatim}
+>>> a = [5, 2, 3, 1, 4]
+>>> a.sort()
+>>> a.reverse()
+>>> print a
+[5, 4, 3, 2, 1]
+\end{verbatim}
+
+Here's a case-insensitive string comparison using a \keyword{lambda} function:
+
+\begin{verbatim}
+>>> import string
+>>> a = string.split("This is a test string from Andrew.")
+>>> a.sort(lambda x, y: cmp(string.lower(x), string.lower(y)))
+>>> print a
+['a', 'Andrew.', 'from', 'is', 'string', 'test', 'This']
+\end{verbatim}
+
+This goes through the overhead of converting a word to lower case
+every time it must be compared.  At times it may be faster to compute
+these once and use those values, and the following example shows how.
+
+\begin{verbatim}
+>>> words = string.split("This is a test string from Andrew.")
+>>> offsets = []
+>>> for i in range(len(words)):
+>>>     offsets.append( (string.lower(words[i]), i) )
+>>> 
+>>> offsets.sort()
+>>> new_words = []
+>>> for dontcare, i in offsets:
+>>>      new_words.append(words[i])
+>>> 
+>>> print new_words
+\end{verbatim}
+
+The \code{offsets} list is initialized to a tuple of the lower-case string
+and its position in the \code{words} list.  It is then sorted.  Python's
+sort method sorts tuples by comparing terms; given \code{x} and \code{y}, compare
+\code{x[0]} to \code{y[0]}, then \code{x[1]} to \code{y[1]}, etc. until there is a difference.
+
+The result is that the \code{offsets} list is ordered by its first
+term, and the second term can be used to figure out where the original
+data was stored.  (The \code{for} loop assigns \code{dontcare} and
+\code{i} to the two fields of each term in the list, but we only need the
+index value.)
+
+Another way to implement this is to store the original data as the
+second term in the \code{offsets} list, as in:
+
+\begin{verbatim}
+>>> words = string.split("This is a test string from Andrew.")
+>>> offsets = []
+>>> for word in words:
+>>>     offsets.append( (string.lower(word), word) )
+>>> 
+>>> offsets.sort()
+>>> new_words = []
+>>> for word in offsets:
+>>>     new_words.append(word[1])
+>>> 
+>>> print new_words
+\end{verbatim}
+
+This isn't always appropriate because the second terms in the list
+(the word, in this example) will be compared when the first terms are
+the same.  If this happens many times, then there will be the unneeded
+performance hit of comparing the two objects.  This can be a large
+cost if most terms are the same and the objects define their own
+\method{__cmp__} method, but there will still be some overhead to determine if
+\method{__cmp__} is defined.
+
+Still, for large lists, or for lists where the comparison information
+is expensive to calculate, the last two examples are likely to be the
+fastest way to sort a list.  It will not work on weakly sorted data,
+like complex numbers, but if you don't know what that means, you
+probably don't need to worry about it.
+
+\section{Comparing classes}
+
+The comparison for two basic data types, like ints to ints or string to
+string, is built into Python and makes sense.  There is a default way
+to compare class instances, but the default manner isn't usually very
+useful.  You can define your own comparison with the \method{__cmp__} method,
+as in:
+
+\begin{verbatim}
+>>> class Spam:
+>>>     def __init__(self, spam, eggs):
+>>>         self.spam = spam
+>>>         self.eggs = eggs
+>>>     def __cmp__(self, other):
+>>>         return cmp(self.spam+self.eggs, other.spam+other.eggs)
+>>>     def __str__(self):
+>>>         return str(self.spam + self.eggs)
+>>> 
+>>> a = [Spam(1, 4), Spam(9, 3), Spam(4,6)]
+>>> a.sort()
+>>> for spam in a:
+>>>   print str(spam)
+5
+10
+12
+\end{verbatim}
+
+Sometimes you may want to sort by a specific attribute of a class.  If
+appropriate you should just define the \method{__cmp__} method to compare
+those values, but you cannot do this if you want to compare between
+different attributes at different times.  Instead, you'll need to go
+back to passing a comparison function to sort, as in:
+
+\begin{verbatim}
+>>> a = [Spam(1, 4), Spam(9, 3), Spam(4,6)]
+>>> a.sort(lambda x, y: cmp(x.eggs, y.eggs))
+>>> for spam in a:
+>>>   print spam.eggs, str(spam)
+3 12
+4 5
+6 10
+\end{verbatim}
+
+If you want to compare two arbitrary attributes (and aren't overly
+concerned about performance) you can even define your own comparison
+function object.  This uses the ability of a class instance to emulate
+an function by defining the \method{__call__} method, as in:
+
+\begin{verbatim}
+>>> class CmpAttr:
+>>>     def __init__(self, attr):
+>>>         self.attr = attr
+>>>     def __call__(self, x, y):
+>>>         return cmp(getattr(x, self.attr), getattr(y, self.attr))
+>>> 
+>>> a = [Spam(1, 4), Spam(9, 3), Spam(4,6)]
+>>> a.sort(CmpAttr("spam"))  # sort by the "spam" attribute
+>>> for spam in a:
+>>>    print spam.spam, spam.eggs, str(spam)
+1 4 5
+4 6 10
+9 3 12
+
+>>> a.sort(CmpAttr("eggs"))   # re-sort by the "eggs" attribute
+>>> for spam in a:
+>>>    print spam.spam, spam.eggs, str(spam)
+9 3 12
+1 4 5
+4 6 10
+\end{verbatim}
+
+Of course, if you want a faster sort you can extract the attributes
+into an intermediate list and sort that list.
+
+
+So, there you have it; about a half-dozen different ways to define how
+to sort a list:
+\begin{itemize}
+ \item sort using the default method
+ \item sort using a comparison function
+ \item reverse sort not using a comparison function
+ \item sort on an intermediate list (two forms)
+ \item sort using class defined __cmp__ method
+ \item sort using a sort function object
+\end{itemize}
+
+\end{document}
+% LocalWords:  maxint
--- a/Doc/howto/unicode.rst
+++ b/Doc/howto/unicode.rst
@ -0,0 +1,765 @@
+Unicode HOWTO
+================
+
+**Version 1.02**
+
+This HOWTO discusses Python's support for Unicode, and explains various 
+problems that people commonly encounter when trying to work with Unicode.
+
+Introduction to Unicode
+------------------------------
+
+History of Character Codes
+''''''''''''''''''''''''''''''
+
+In 1968, the American Standard Code for Information Interchange,
+better known by its acronym ASCII, was standardized.  ASCII defined
+numeric codes for various characters, with the numeric values running from 0 to
+127.  For example, the lowercase letter 'a' is assigned 97 as its code
+value.
+
+ASCII was an American-developed standard, so it only defined
+unaccented characters.  There was an 'e', but no 'é' or 'Í'.  This
+meant that languages which required accented characters couldn't be
+faithfully represented in ASCII.  (Actually the missing accents matter
+for English, too, which contains words such as 'naïve' and 'café', and some
+publications have house styles which require spellings such as
+'coöperate'.)
+
+For a while people just wrote programs that didn't display accents.  I
+remember looking at Apple ][ BASIC programs, published in French-language
+publications in the mid-1980s, that had lines like these::
+
+	PRINT "FICHER EST COMPLETE."
+	PRINT "CARACTERE NON ACCEPTE."
+
+Those messages should contain accents, and they just look wrong to
+someone who can read French.  
+
+In the 1980s, almost all personal computers were 8-bit, meaning that
+bytes could hold values ranging from 0 to 255.  ASCII codes only went
+up to 127, so some machines assigned values between 128 and 255 to
+accented characters.  Different machines had different codes, however,
+which led to problems exchanging files.  Eventually various commonly
+used sets of values for the 128-255 range emerged.  Some were true
+standards, defined by the International Standards Organization, and
+some were **de facto** conventions that were invented by one company
+or another and managed to catch on.
+
+255 characters aren't very many.  For example, you can't fit
+both the accented characters used in Western Europe and the Cyrillic
+alphabet used for Russian into the 128-255 range because there are more than
+127 such characters.
+
+You could write files using different codes (all your Russian
+files in a coding system called KOI8, all your French files in 
+a different coding system called Latin1), but what if you wanted
+to write a French document that quotes some Russian text?  In the
+1980s people began to want to solve this problem, and the Unicode
+standardization effort began.
+
+Unicode started out using 16-bit characters instead of 8-bit characters.  16
+bits means you have 2^16 = 65,536 distinct values available, making it
+possible to represent many different characters from many different
+alphabets; an initial goal was to have Unicode contain the alphabets for
+every single human language.  It turns out that even 16 bits isn't enough to
+meet that goal, and the modern Unicode specification uses a wider range of
+codes, 0-1,114,111 (0x10ffff in base-16).
+
+There's a related ISO standard, ISO 10646.  Unicode and ISO 10646 were
+originally separate efforts, but the specifications were merged with
+the 1.1 revision of Unicode.  
+
+(This discussion of Unicode's history is highly simplified.  I don't
+think the average Python programmer needs to worry about the
+historical details; consult the Unicode consortium site listed in the
+References for more information.)
+
+
+Definitions
+''''''''''''''''''''''''
+
+A **character** is the smallest possible component of a text.  'A',
+'B', 'C', etc., are all different characters.  So are 'È' and
+'Í'.  Characters are abstractions, and vary depending on the
+language or context you're talking about.  For example, the symbol for
+ohms (Ω) is usually drawn much like the capital letter
+omega (Ω) in the Greek alphabet (they may even be the same in
+some fonts), but these are two different characters that have
+different meanings.
+
+The Unicode standard describes how characters are represented by
+**code points**.  A code point is an integer value, usually denoted in
+base 16.  In the standard, a code point is written using the notation
+U+12ca to mean the character with value 0x12ca (4810 decimal).  The
+Unicode standard contains a lot of tables listing characters and their
+corresponding code points::
+
+	0061    'a'; LATIN SMALL LETTER A
+	0062    'b'; LATIN SMALL LETTER B
+	0063    'c'; LATIN SMALL LETTER C
+        ...
+	007B	'{'; LEFT CURLY BRACKET
+
+Strictly, these definitions imply that it's meaningless to say 'this is
+character U+12ca'.  U+12ca is a code point, which represents some particular
+character; in this case, it represents the character 'ETHIOPIC SYLLABLE WI'.
+In informal contexts, this distinction between code points and characters will
+sometimes be forgotten.
+
+A character is represented on a screen or on paper by a set of graphical
+elements that's called a **glyph**.  The glyph for an uppercase A, for
+example, is two diagonal strokes and a horizontal stroke, though the exact
+details will depend on the font being used.  Most Python code doesn't need
+to worry about glyphs; figuring out the correct glyph to display is
+generally the job of a GUI toolkit or a terminal's font renderer.
+
+
+Encodings
+'''''''''
+
+To summarize the previous section: 
+a Unicode string is a sequence of code points, which are
+numbers from 0 to 0x10ffff.  This sequence needs to be represented as
+a set of bytes (meaning, values from 0-255) in memory.  The rules for
+translating a Unicode string into a sequence of bytes are called an 
+**encoding**.
+
+The first encoding you might think of is an array of 32-bit integers.  
+In this representation, the string "Python" would look like this::
+
+       P           y           t           h           o           n
+    0x50 00 00 00 79 00 00 00 74 00 00 00 68 00 00 00 6f 00 00 00 6e 00 00 00 
+       0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 
+
+This representation is straightforward but using
+it presents a number of problems.
+
+1. It's not portable; different processors order the bytes 
+   differently. 
+
+2. It's very wasteful of space.  In most texts, the majority of the code 
+   points are less than 127, or less than 255, so a lot of space is occupied
+   by zero bytes.  The above string takes 24 bytes compared to the 6
+   bytes needed for an ASCII representation.  Increased RAM usage doesn't
+   matter too much (desktop computers have megabytes of RAM, and strings
+   aren't usually that large), but expanding our usage of disk and
+   network bandwidth by a factor of 4 is intolerable.
+
+3. It's not compatible with existing C functions such as ``strlen()``,
+   so a new family of wide string functions would need to be used.
+
+4. Many Internet standards are defined in terms of textual data, and 
+   can't handle content with embedded zero bytes.
+
+Generally people don't use this encoding, choosing other encodings
+that are more efficient and convenient.
+
+Encodings don't have to handle every possible Unicode character, and
+most encodings don't.  For example, Python's default encoding is the
+'ascii' encoding.  The rules for converting a Unicode string into the
+ASCII encoding are are simple; for each code point:
+
+1. If the code point is <128, each byte is the same as the value of the 
+   code point.
+
+2. If the code point is 128 or greater, the Unicode string can't 
+   be represented in this encoding.  (Python raises  a 
+   ``UnicodeEncodeError`` exception in this case.)
+
+Latin-1, also known as ISO-8859-1, is a similar encoding.  Unicode
+code points 0-255 are identical to the Latin-1 values, so converting
+to this encoding simply requires converting code points to byte
+values; if a code point larger than 255 is encountered, the string
+can't be encoded into Latin-1.
+
+Encodings don't have to be simple one-to-one mappings like Latin-1.
+Consider IBM's EBCDIC, which was used on IBM mainframes.  Letter
+values weren't in one block: 'a' through 'i' had values from 129 to
+137, but 'j' through 'r' were 145 through 153.  If you wanted to use
+EBCDIC as an encoding, you'd probably use some sort of lookup table to
+perform the conversion, but this is largely an internal detail.
+
+UTF-8 is one of the most commonly used encodings.  UTF stands for
+"Unicode Transformation Format", and the '8' means that 8-bit numbers
+are used in the encoding.  (There's also a UTF-16 encoding, but it's
+less frequently used than UTF-8.)  UTF-8 uses the following rules:
+
+1. If the code point is <128, it's represented by the corresponding byte value.
+2. If the code point is between 128 and 0x7ff, it's turned into two byte values
+   between 128 and 255.
+3. Code points >0x7ff are turned into three- or four-byte sequences, where
+   each byte of the sequence is between 128 and 255.
+    
+UTF-8 has several convenient properties:
+
+1. It can handle any Unicode code point.
+2. A Unicode string is turned into a string of bytes containing no embedded zero bytes.  This avoids byte-ordering issues, and means UTF-8 strings can be processed by C functions such as ``strcpy()`` and sent through protocols that can't handle zero bytes.
+3. A string of ASCII text is also valid UTF-8 text. 
+4. UTF-8 is fairly compact; the majority of code points are turned into two bytes, and values less than 128 occupy only a single byte.
+5. If bytes are corrupted or lost, it's possible to determine the start of the next UTF-8-encoded code point and resynchronize.  It's also unlikely that random 8-bit data will look like valid UTF-8.
+
+
+
+References
+''''''''''''''
+
+The Unicode Consortium site at <http://www.unicode.org> has character
+charts, a glossary, and PDF versions of the Unicode specification.  Be
+prepared for some difficult reading.
+<http://www.unicode.org/history/> is a chronology of the origin and
+development of Unicode.
+
+To help understand the standard, Jukka Korpela has written an
+introductory guide to reading the Unicode character tables, 
+available at <http://www.cs.tut.fi/~jkorpela/unicode/guide.html>.
+
+Roman Czyborra wrote another explanation of Unicode's basic principles; 
+it's at <http://czyborra.com/unicode/characters.html>.
+Czyborra has written a number of other Unicode-related documentation, 
+available from <http://www.cyzborra.com>.
+
+Two other good introductory articles were written by Joel Spolsky
+<http://www.joelonsoftware.com/articles/Unicode.html> and Jason
+Orendorff <http://www.jorendorff.com/articles/unicode/>.  If this
+introduction didn't make things clear to you, you should try reading
+one of these alternate articles before continuing.
+
+Wikipedia entries are often helpful; see the entries for "character
+encoding" <http://en.wikipedia.org/wiki/Character_encoding> and UTF-8
+<http://en.wikipedia.org/wiki/UTF-8>, for example.
+
+
+Python's Unicode Support
+------------------------
+
+Now that you've learned the rudiments of Unicode, we can look at
+Python's Unicode features.
+
+
+The Unicode Type
+'''''''''''''''''''
+
+Unicode strings are expressed as instances of the ``unicode`` type,
+one of Python's repertoire of built-in types.  It derives from an
+abstract type called ``basestring``, which is also an ancestor of the
+``str`` type; you can therefore check if a value is a string type with
+``isinstance(value, basestring)``.  Under the hood, Python represents
+Unicode strings as either 16- or 32-bit integers, depending on how the
+Python interpreter was compiled, but this 
+
+The ``unicode()`` constructor has the signature ``unicode(string[, encoding, errors])``.
+All of its arguments should be 8-bit strings.  The first argument is converted 
+to Unicode using the specified encoding; if you leave off the ``encoding`` argument, 
+the ASCII encoding is used for the conversion, so characters greater than 127 will 
+be treated as errors::
+
+    >>> unicode('abcdef')
+    u'abcdef'
+    >>> s = unicode('abcdef')
+    >>> type(s)
+    <type 'unicode'>
+    >>> unicode('abcdef' + chr(255))
+    Traceback (most recent call last):
+      File "<stdin>", line 1, in ?
+    UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 6: 
+                        ordinal not in range(128)
+
+The ``errors`` argument specifies the response when the input string can't be converted according to the encoding's rules.  Legal values for this argument 
+are 'strict' (raise a ``UnicodeDecodeError`` exception), 
+'replace' (add U+FFFD, 'REPLACEMENT CHARACTER'), 
+or 'ignore' (just leave the character out of the Unicode result).  
+The following examples show the differences::
+
+    >>> unicode('\x80abc', errors='strict')
+    Traceback (most recent call last):
+      File "<stdin>", line 1, in ?
+    UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: 
+                        ordinal not in range(128)
+    >>> unicode('\x80abc', errors='replace')
+    u'\ufffdabc'
+    >>> unicode('\x80abc', errors='ignore')
+    u'abc'
+
+Encodings are specified as strings containing the encoding's name.
+Python 2.4 comes with roughly 100 different encodings; see the Python
+Library Reference at
+<http://docs.python.org/lib/standard-encodings.html> for a list.  Some
+encodings have multiple names; for example, 'latin-1', 'iso_8859_1'
+and '8859' are all synonyms for the same encoding.
+
+One-character Unicode strings can also be created with the
+``unichr()`` built-in function, which takes integers and returns a
+Unicode string of length 1 that contains the corresponding code point.
+The reverse operation is the built-in `ord()` function that takes a
+one-character Unicode string and returns the code point value::
+
+    >>> unichr(40960)
+    u'\ua000'
+    >>> ord(u'\ua000')
+    40960
+
+Instances of the ``unicode`` type have many of the same methods as 
+the 8-bit string type for operations such as searching and formatting::
+
+    >>> s = u'Was ever feather so lightly blown to and fro as this multitude?'
+    >>> s.count('e')
+    5
+    >>> s.find('feather')
+    9
+    >>> s.find('bird')
+    -1
+    >>> s.replace('feather', 'sand')
+    u'Was ever sand so lightly blown to and fro as this multitude?'
+    >>> s.upper()
+    u'WAS EVER FEATHER SO LIGHTLY BLOWN TO AND FRO AS THIS MULTITUDE?'
+
+Note that the arguments to these methods can be Unicode strings or 8-bit strings.  
+8-bit strings will be converted to Unicode before carrying out the operation;
+Python's default ASCII encoding will be used, so characters greater than 127 will cause an exception::
+
+    >>> s.find('Was\x9f')
+    Traceback (most recent call last):
+      File "<stdin>", line 1, in ?
+    UnicodeDecodeError: 'ascii' codec can't decode byte 0x9f in position 3: ordinal not in range(128)
+    >>> s.find(u'Was\x9f')
+    -1
+
+Much Python code that operates on strings will therefore work with
+Unicode strings without requiring any changes to the code.  (Input and
+output code needs more updating for Unicode; more on this later.)
+
+Another important method is ``.encode([encoding], [errors='strict'])``, 
+which returns an 8-bit string version of the
+Unicode string, encoded in the requested encoding.  The ``errors``
+parameter is the same as the parameter of the ``unicode()``
+constructor, with one additional possibility; as well as 'strict',
+'ignore', and 'replace', you can also pass 'xmlcharrefreplace' which
+uses XML's character references.  The following example shows the
+different results::
+
+    >>> u = unichr(40960) + u'abcd' + unichr(1972)
+    >>> u.encode('utf-8')
+    '\xea\x80\x80abcd\xde\xb4'
+    >>> u.encode('ascii')
+    Traceback (most recent call last):
+      File "<stdin>", line 1, in ?
+    UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in position 0: ordinal not in range(128)
+    >>> u.encode('ascii', 'ignore')
+    'abcd'
+    >>> u.encode('ascii', 'replace')
+    '?abcd?'
+    >>> u.encode('ascii', 'xmlcharrefreplace')
+    '&#40960;abcd&#1972;'
+
+Python's 8-bit strings have a ``.decode([encoding], [errors])`` method 
+that interprets the string using the given encoding::
+
+    >>> u = unichr(40960) + u'abcd' + unichr(1972)   # Assemble a string
+    >>> utf8_version = u.encode('utf-8')             # Encode as UTF-8
+    >>> type(utf8_version), utf8_version
+    (<type 'str'>, '\xea\x80\x80abcd\xde\xb4')
+    >>> u2 = utf8_version.decode('utf-8')            # Decode using UTF-8
+    >>> u == u2                                      # The two strings match
+    True
+ 
+The low-level routines for registering and accessing the available
+encodings are found in the ``codecs`` module.  However, the encoding
+and decoding functions returned by this module are usually more
+low-level than is comfortable, so I'm not going to describe the
+``codecs`` module here.  If you need to implement a completely new
+encoding, you'll need to learn about the ``codecs`` module interfaces,
+but implementing encodings is a specialized task that also won't be
+covered here.  Consult the Python documentation to learn more about
+this module.
+
+The most commonly used part of the ``codecs`` module is the 
+``codecs.open()`` function which will be discussed in the section
+on input and output.
+            
+            
+Unicode Literals in Python Source Code
+''''''''''''''''''''''''''''''''''''''''''
+
+In Python source code, Unicode literals are written as strings
+prefixed with the 'u' or 'U' character: ``u'abcdefghijk'``.  Specific
+code points can be written using the ``\u`` escape sequence, which is
+followed by four hex digits giving the code point.  The ``\U`` escape
+sequence is similar, but expects 8 hex digits, not 4.  
+
+Unicode literals can also use the same escape sequences as 8-bit
+strings, including ``\x``, but ``\x`` only takes two hex digits so it
+can't express an arbitrary code point.  Octal escapes can go up to
+U+01ff, which is octal 777.
+
+::
+
+    >>> s = u"a\xac\u1234\u20ac\U00008000"
+               ^^^^ two-digit hex escape
+                   ^^^^^^ four-digit Unicode escape 
+                               ^^^^^^^^^^ eight-digit Unicode escape
+    >>> for c in s:  print ord(c),
+    ... 
+    97 172 4660 8364 32768
+
+Using escape sequences for code points greater than 127 is fine in
+small doses, but becomes an annoyance if you're using many accented
+characters, as you would in a program with messages in French or some
+other accent-using language.  You can also assemble strings using the
+``unichr()`` built-in function, but this is even more tedious.
+
+Ideally, you'd want to be able to write literals in your language's
+natural encoding.  You could then edit Python source code with your
+favorite editor which would display the accented characters naturally,
+and have the right characters used at runtime.
+
+Python supports writing Unicode literals in any encoding, but you have
+to declare the encoding being used.  This is done by including a
+special comment as either the first or second line of the source
+file::
+
+    #!/usr/bin/env python
+    # -*- coding: latin-1 -*-
+    
+    u = u'abcdé'
+    print ord(u[-1])
+    
+The syntax is inspired by Emacs's notation for specifying variables local to a file.
+Emacs supports many different variables, but Python only supports 'coding'.  
+The ``-*-`` symbols indicate that the comment is special; within them,
+you must supply the name ``coding`` and the name of your chosen encoding, 
+separated by ``':'``.  
+
+If you don't include such a comment, the default encoding used will be
+ASCII.  Versions of Python before 2.4 were Euro-centric and assumed
+Latin-1 as a default encoding for string literals; in Python 2.4,
+characters greater than 127 still work but result in a warning.  For
+example, the following program has no encoding declaration::
+
+    #!/usr/bin/env python
+    u = u'abcdé'
+    print ord(u[-1])
+
+When you run it with Python 2.4, it will output the following warning::
+
+    amk:~$ python p263.py
+    sys:1: DeprecationWarning: Non-ASCII character '\xe9' 
+         in file p263.py on line 2, but no encoding declared; 
+         see http://www.python.org/peps/pep-0263.html for details
+  
+
+Unicode Properties
+'''''''''''''''''''
+
+The Unicode specification includes a database of information about
+code points.  For each code point that's defined, the information
+includes the character's name, its category, the numeric value if
+applicable (Unicode has characters representing the Roman numerals and
+fractions such as one-third and four-fifths).  There are also
+properties related to the code point's use in bidirectional text and
+other display-related properties.
+
+The following program displays some information about several
+characters, and prints the numeric value of one particular character::
+
+    import unicodedata
+    
+    u = unichr(233) + unichr(0x0bf2) + unichr(3972) + unichr(6000) + unichr(13231)
+    
+    for i, c in enumerate(u):
+        print i, '%04x' % ord(c), unicodedata.category(c),
+        print unicodedata.name(c)
+    
+    # Get numeric value of second character
+    print unicodedata.numeric(u[1])
+
+When run, this prints::
+
+    0 00e9 Ll LATIN SMALL LETTER E WITH ACUTE
+    1 0bf2 No TAMIL NUMBER ONE THOUSAND
+    2 0f84 Mn TIBETAN MARK HALANTA
+    3 1770 Lo TAGBANWA LETTER SA
+    4 33af So SQUARE RAD OVER S SQUARED
+    1000.0
+
+The category codes are abbreviations describing the nature of the
+character.  These are grouped into categories such as "Letter",
+"Number", "Punctuation", or "Symbol", which in turn are broken up into
+subcategories.  To take the codes from the above output, ``'Ll'``
+means 'Letter, lowercase', ``'No'`` means "Number, other", ``'Mn'`` is
+"Mark, nonspacing", and ``'So'`` is "Symbol, other".  See
+<http://www.unicode.org/Public/UNIDATA/UCD.html#General_Category_Values>
+for a list of category codes.
+
+References
+''''''''''''''
+
+The Unicode and 8-bit string types are described in the Python library
+reference at <http://docs.python.org/lib/typesseq.html>.
+
+The documentation for the ``unicodedata`` module is at 
+<http://docs.python.org/lib/module-unicodedata.html>.
+
+The documentation for the ``codecs`` module is at
+<http://docs.python.org/lib/module-codecs.html>.
+
+Marc-André Lemburg gave a presentation at EuroPython 2002
+titled "Python and Unicode".  A PDF version of his slides
+is available at <http://www.egenix.com/files/python/Unicode-EPC2002-Talk.pdf>,
+and is an excellent overview of the design of Python's Unicode features.
+
+
+Reading and Writing Unicode Data
+----------------------------------------
+
+Once you've written some code that works with Unicode data, the next
+problem is input/output.  How do you get Unicode strings into your
+program, and how do you convert Unicode into a form suitable for
+storage or transmission?  
+
+It's possible that you may not need to do anything depending on your
+input sources and output destinations; you should check whether the
+libraries used in your application support Unicode natively.  XML
+parsers often return Unicode data, for example.  Many relational
+databases also support Unicode-valued columns and can return Unicode
+values from an SQL query.
+
+Unicode data is usually converted to a particular encoding before it
+gets written to disk or sent over a socket.  It's possible to do all
+the work yourself: open a file, read an 8-bit string from it, and
+convert the string with ``unicode(str, encoding)``.  However, the
+manual approach is not recommended.
+
+One problem is the multi-byte nature of encodings; one Unicode
+character can be represented by several bytes.  If you want to read
+the file in arbitrary-sized chunks (say, 1K or 4K), you need to write
+error-handling code to catch the case where only part of the bytes
+encoding a single Unicode character are read at the end of a chunk.
+One solution would be to read the entire file into memory and then
+perform the decoding, but that prevents you from working with files
+that are extremely large; if you need to read a 2Gb file, you need 2Gb
+of RAM.  (More, really, since for at least a moment you'd need to have 
+both the encoded string and its Unicode version in memory.)
+
+The solution would be to use the low-level decoding interface to catch
+the case of partial coding sequences.   The work of implementing this
+has already been done for you: the ``codecs`` module includes a
+version of the ``open()`` function that returns a file-like object
+that assumes the file's contents are in a specified encoding and
+accepts Unicode parameters for methods such as ``.read()`` and
+``.write()``.
+
+The function's parameters are 
+``open(filename, mode='rb', encoding=None, errors='strict', buffering=1)``.  ``mode`` can be
+``'r'``, ``'w'``, or ``'a'``, just like the corresponding parameter to the
+regular built-in ``open()`` function; add a ``'+'`` to 
+update the file.  ``buffering`` is similarly
+parallel to the standard function's parameter.  
+``encoding`` is a string giving 
+the encoding to use; if it's left as ``None``, a regular Python file
+object that accepts 8-bit strings is returned.  Otherwise, a wrapper
+object is returned, and data written to or read from the wrapper
+object will be converted as needed.  ``errors`` specifies the action
+for encoding errors and can be one of the usual values of 'strict',
+'ignore', and 'replace'.
+
+Reading Unicode from a file is therefore simple::
+
+    import codecs
+    f = codecs.open('unicode.rst', encoding='utf-8')
+    for line in f:
+        print repr(line)
+
+It's also possible to open files in update mode, 
+allowing both reading and writing::
+
+    f = codecs.open('test', encoding='utf-8', mode='w+')
+    f.write(u'\u4500 blah blah blah\n')
+    f.seek(0)
+    print repr(f.readline()[:1])
+    f.close()
+
+Unicode character U+FEFF is used as a byte-order mark (BOM), 
+and is often written as the first character of a file in order
+to assist with autodetection of the file's byte ordering.
+Some encodings, such as UTF-16, expect a BOM to be present at 
+the start of a file; when such an encoding is used,
+the BOM will be automatically written as the first character 
+and will be silently dropped when the file is read.  There are 
+variants of these encodings, such as 'utf-16-le' and 'utf-16-be'
+for little-endian and big-endian encodings, that specify 
+one particular byte ordering and don't
+skip the BOM.
+
+
+Unicode filenames
+'''''''''''''''''''''''''
+
+Most of the operating systems in common use today support filenames
+that contain arbitrary Unicode characters.  Usually this is
+implemented by converting the Unicode string into some encoding that
+varies depending on the system.  For example, MacOS X uses UTF-8 while
+Windows uses a configurable encoding; on Windows, Python uses the name
+"mbcs" to refer to whatever the currently configured encoding is.  On
+Unix systems, there will only be a filesystem encoding if you've set
+the ``LANG`` or ``LC_CTYPE`` environment variables; if you haven't,
+the default encoding is ASCII.
+
+The ``sys.getfilesystemencoding()`` function returns the encoding to
+use on your current system, in case you want to do the encoding
+manually, but there's not much reason to bother.  When opening a file
+for reading or writing, you can usually just provide the Unicode
+string as the filename, and it will be automatically converted to the
+right encoding for you::
+
+    filename = u'filename\u4500abc'
+    f = open(filename, 'w')
+    f.write('blah\n')
+    f.close()
+
+Functions in the ``os`` module such as ``os.stat()`` will also accept
+Unicode filenames.
+
+``os.listdir()``, which returns filenames, raises an issue: should it
+return the Unicode version of filenames, or should it return 8-bit
+strings containing the encoded versions?  ``os.listdir()`` will do
+both, depending on whether you provided the directory path as an 8-bit
+string or a Unicode string.  If you pass a Unicode string as the path,
+filenames will be decoded using the filesystem's encoding and a list
+of Unicode strings will be returned, while passing an 8-bit path will
+return the 8-bit versions of the filenames.  For example, assuming the
+default filesystem encoding is UTF-8, running the following program::
+
+	fn = u'filename\u4500abc'
+	f = open(fn, 'w')
+	f.close()
+
+	import os
+	print os.listdir('.')
+	print os.listdir(u'.')
+
+will produce the following output::
+
+	amk:~$ python t.py
+	['.svn', 'filename\xe4\x94\x80abc', ...]
+	[u'.svn', u'filename\u4500abc', ...]
+
+The first list contains UTF-8-encoded filenames, and the second list
+contains the Unicode versions.
+
+
+	
+Tips for Writing Unicode-aware Programs
+''''''''''''''''''''''''''''''''''''''''''''
+
+This section provides some suggestions on writing software that 
+deals with Unicode.
+
+The most important tip is: 
+
+    Software should only work with Unicode strings internally, 
+    converting to a particular encoding on output.  
+
+If you attempt to write processing functions that accept both 
+Unicode and 8-bit strings, you will find your program vulnerable to 
+bugs wherever you combine the two different kinds of strings.  Python's 
+default encoding is ASCII, so whenever a character with an ASCII value >127
+is in the input data, you'll get a ``UnicodeDecodeError``
+because that character can't be handled by the ASCII encoding.  
+
+It's easy to miss such problems if you only test your software 
+with data that doesn't contain any 
+accents; everything will seem to work, but there's actually a bug in your
+program waiting for the first user who attempts to use characters >127.
+A second tip, therefore, is:
+
+    Include characters >127 and, even better, characters >255 in your
+    test data.
+
+When using data coming from a web browser or some other untrusted source,
+a common technique is to check for illegal characters in a string
+before using the string in a generated command line or storing it in a 
+database.  If you're doing this, be careful to check 
+the string once it's in the form that will be used or stored; it's 
+possible for encodings to be used to disguise characters.  This is especially
+true if the input data also specifies the encoding; 
+many encodings leave the commonly checked-for characters alone, 
+but Python includes some encodings such as ``'base64'``
+that modify every single character.
+
+For example, let's say you have a content management system that takes a 
+Unicode filename, and you want to disallow paths with a '/' character.
+You might write this code::
+
+    def read_file (filename, encoding):
+        if '/' in filename:
+            raise ValueError("'/' not allowed in filenames")
+        unicode_name = filename.decode(encoding)
+        f = open(unicode_name, 'r')
+        # ... return contents of file ...
+        
+However, if an attacker could specify the ``'base64'`` encoding,
+they could pass ``'L2V0Yy9wYXNzd2Q='``, which is the base-64
+encoded form of the string ``'/etc/passwd'``, to read a 
+system file.   The above code looks for ``'/'`` characters 
+in the encoded form and misses the dangerous character 
+in the resulting decoded form.
+
+References
+''''''''''''''
+
+The PDF slides for Marc-André Lemburg's presentation "Writing
+Unicode-aware Applications in Python" are available at
+<http://www.egenix.com/files/python/LSM2005-Developing-Unicode-aware-applications-in-Python.pdf>
+and discuss questions of character encodings as well as how to
+internationalize and localize an application.
+
+
+Revision History and Acknowledgements
+------------------------------------------
+
+Thanks to the following people who have noted errors or offered
+suggestions on this article: Nicholas Bastin, 
+Marius Gedminas, Kent Johnson, Ken Krugler,
+Marc-André Lemburg, Martin von Löwis.
+
+Version 1.0: posted August 5 2005.
+
+Version 1.01: posted August 7 2005.  Corrects factual and markup
+errors; adds several links.
+
+Version 1.02: posted August 16 2005.  Corrects factual errors.
+
+
+.. comment Additional topic: building Python w/ UCS2 or UCS4 support
+.. comment Describe obscure -U switch somewhere?
+
+.. comment 
+   Original outline:
+
+   - [ ] Unicode introduction
+       - [ ] ASCII
+       - [ ] Terms
+	   - [ ] Character
+	   - [ ] Code point
+	 - [ ] Encodings
+	    - [ ] Common encodings: ASCII, Latin-1, UTF-8
+       - [ ] Unicode Python type
+	   - [ ] Writing unicode literals
+	       - [ ] Obscurity: -U switch
+	   - [ ] Built-ins
+	       - [ ] unichr()
+	       - [ ] ord()
+	       - [ ] unicode() constructor
+	   - [ ] Unicode type
+	       - [ ] encode(), decode() methods
+       - [ ] Unicodedata module for character properties
+       - [ ] I/O
+	   - [ ] Reading/writing Unicode data into files
+	       - [ ] Byte-order marks
+	   - [ ] Unicode filenames
+       - [ ] Writing Unicode programs
+	   - [ ] Do everything in Unicode
+	   - [ ] Declaring source code encodings (PEP 263)
+       - [ ] Other issues
+	   - [ ] Building Python (UCS2, UCS4)