Merged revisions 80605-80609,80642-80646,80651-80652,80674,80684-80686,80748,80852,80854,80870,80872-80873,80907,80915-80916,80951-80952,80976-80977,80985,81038-81040,81042,81053,81070,81104-81105,81114,81125,81245,81285,81402,81463,81516,81562-81563,81567,81593,81635,81680-81681,81684,81801,81888,81931-81933,81939-81942,81963,81984,81991,82120,82188,82264-82267 via svnmerge from

svn+ssh://pythondev@svn.python.org/python/trunk

........
  r80605 | andrew.kuchling | 2010-04-28 19:22:16 -0500 (Wed, 28 Apr 2010) | 1 line

  Add various items
........
  r80606 | andrew.kuchling | 2010-04-28 20:44:30 -0500 (Wed, 28 Apr 2010) | 6 lines

  Fix doubled 'the'.
  Markup fixes to use :exc:, :option: in a few places.
    (Glitch: unittest.main's -c ends up a link to the Python
    interpreter's -c option.  Should we skip using :option: for that
    switch, or disable the auto-linking somehow?)
........
  r80607 | andrew.kuchling | 2010-04-28 20:45:41 -0500 (Wed, 28 Apr 2010) | 1 line

  Add various unittest items
........
  r80608 | benjamin.peterson | 2010-04-28 22:18:05 -0500 (Wed, 28 Apr 2010) | 1 line

  update pypy description
........
  r80609 | benjamin.peterson | 2010-04-28 22:30:59 -0500 (Wed, 28 Apr 2010) | 1 line

  update pypy url
........
  r80642 | andrew.kuchling | 2010-04-29 19:49:09 -0500 (Thu, 29 Apr 2010) | 1 line

  Always add space after RFC; reword paragraph
........
  r80643 | andrew.kuchling | 2010-04-29 19:52:31 -0500 (Thu, 29 Apr 2010) | 6 lines

  Reword paragraph to make its meaning clearer.

  Antoine Pitrou: is my version of the paragraph still correct?

  R. David Murray: is this more understandable than the previous version?
........
  r80644 | andrew.kuchling | 2010-04-29 20:02:15 -0500 (Thu, 29 Apr 2010) | 1 line

  Fix typos
........
  r80645 | andrew.kuchling | 2010-04-29 20:32:47 -0500 (Thu, 29 Apr 2010) | 1 line

  Markup fix; clarify by adding 'in that order'
........
  r80646 | andrew.kuchling | 2010-04-29 20:33:40 -0500 (Thu, 29 Apr 2010) | 1 line

  Add various items; rearrange unittest section a bit
........
  r80651 | andrew.kuchling | 2010-04-30 08:46:55 -0500 (Fri, 30 Apr 2010) | 1 line

  Minor grammar re-wording
........
  r80652 | andrew.kuchling | 2010-04-30 08:47:34 -0500 (Fri, 30 Apr 2010) | 1 line

  Add item
........
  r80674 | andrew.kuchling | 2010-04-30 20:19:16 -0500 (Fri, 30 Apr 2010) | 1 line

  Add various items
........
  r80684 | andrew.kuchling | 2010-05-01 07:05:52 -0500 (Sat, 01 May 2010) | 1 line

  Minor grammar fix
........
  r80685 | andrew.kuchling | 2010-05-01 07:06:51 -0500 (Sat, 01 May 2010) | 1 line

  Describe memoryview
........
  r80686 | antoine.pitrou | 2010-05-01 07:16:39 -0500 (Sat, 01 May 2010) | 4 lines

  Fix attribution. Travis didn't do much and he did a bad work.
  (yes, this is a sensitive subject, sorry)
........
  r80748 | andrew.kuchling | 2010-05-03 20:24:22 -0500 (Mon, 03 May 2010) | 1 line

  Add some more items; the urlparse change is added twice
........
  r80852 | andrew.kuchling | 2010-05-05 20:09:47 -0500 (Wed, 05 May 2010) | 1 line

  Reword paragraph; fix filename, which should be pyconfig.h
........
  r80854 | andrew.kuchling | 2010-05-05 20:10:56 -0500 (Wed, 05 May 2010) | 1 line

  Add various items
........
  r80870 | andrew.kuchling | 2010-05-06 09:14:09 -0500 (Thu, 06 May 2010) | 1 line

  Describe ElementTree 1.3; rearrange new-module sections; describe dict views as sets; small edits and items
........
  r80872 | andrew.kuchling | 2010-05-06 12:21:59 -0500 (Thu, 06 May 2010) | 1 line

  Add 2 items; record ideas for two initial sections; clarify wording
........
  r80873 | andrew.kuchling | 2010-05-06 12:27:57 -0500 (Thu, 06 May 2010) | 1 line

  Change section title; point to unittest2
........
  r80907 | andrew.kuchling | 2010-05-06 20:45:14 -0500 (Thu, 06 May 2010) | 1 line

  Add a new section on the development plan; add an item
........
  r80915 | antoine.pitrou | 2010-05-07 05:15:51 -0500 (Fri, 07 May 2010) | 3 lines

  Fix some markup and a class name. Also, wrap a long line.
........
  r80916 | andrew.kuchling | 2010-05-07 06:30:47 -0500 (Fri, 07 May 2010) | 1 line

  Re-word text
........
  r80951 | andrew.kuchling | 2010-05-07 20:15:26 -0500 (Fri, 07 May 2010) | 1 line

  Add two items
........
  r80952 | andrew.kuchling | 2010-05-07 20:35:55 -0500 (Fri, 07 May 2010) | 1 line

  Get accents correct
........
  r80976 | andrew.kuchling | 2010-05-08 08:28:03 -0500 (Sat, 08 May 2010) | 1 line

  Add logging.dictConfig example; give up on writing a Ttk example
........
  r80977 | andrew.kuchling | 2010-05-08 08:29:46 -0500 (Sat, 08 May 2010) | 1 line

  Markup fixes
........
  r80985 | andrew.kuchling | 2010-05-08 10:39:46 -0500 (Sat, 08 May 2010) | 7 lines

  Write summary of the 2.7 release; rewrite the future section some more;
  mention PYTHONWARNINGS env. var; tweak some examples for readability.

  And with this commit, the "What's New" is done... except for a
  complete read-through to polish the text, and fixing any reported errors,
  but those tasks can easily wait until after beta2.
........
  r81038 | benjamin.peterson | 2010-05-09 16:09:40 -0500 (Sun, 09 May 2010) | 1 line

  finish clause
........
  r81039 | andrew.kuchling | 2010-05-10 09:18:27 -0500 (Mon, 10 May 2010) | 1 line

  Markup fix; re-word a sentence
........
  r81040 | andrew.kuchling | 2010-05-10 09:20:12 -0500 (Mon, 10 May 2010) | 1 line

  Use title case
........
  r81042 | andrew.kuchling | 2010-05-10 10:03:35 -0500 (Mon, 10 May 2010) | 1 line

  Link to unittest2 article
........
  r81053 | florent.xicluna | 2010-05-10 14:59:22 -0500 (Mon, 10 May 2010) | 2 lines

  Add a link on maketrans().
........
  r81070 | andrew.kuchling | 2010-05-10 18:13:41 -0500 (Mon, 10 May 2010) | 1 line

  Fix typo
........
  r81104 | andrew.kuchling | 2010-05-11 19:38:44 -0500 (Tue, 11 May 2010) | 1 line

  Revision pass: lots of edits, typo fixes, rearrangements
........
  r81105 | andrew.kuchling | 2010-05-11 19:40:47 -0500 (Tue, 11 May 2010) | 1 line

  Let's call this done
........
  r81114 | andrew.kuchling | 2010-05-12 08:56:07 -0500 (Wed, 12 May 2010) | 1 line

  Grammar fix
........
  r81125 | andrew.kuchling | 2010-05-12 13:56:48 -0500 (Wed, 12 May 2010) | 1 line

  #8696: add documentation for logging.config.dictConfig (PEP 391)
........
  r81245 | andrew.kuchling | 2010-05-16 18:31:16 -0500 (Sun, 16 May 2010) | 1 line

  Add cross-reference to later section
........
  r81285 | vinay.sajip | 2010-05-18 03:16:27 -0500 (Tue, 18 May 2010) | 1 line

  Fixed minor typo in ReST markup.
........
  r81402 | vinay.sajip | 2010-05-21 12:41:34 -0500 (Fri, 21 May 2010) | 1 line

  Updated logging documentation with more dictConfig information.
........
  r81463 | georg.brandl | 2010-05-22 03:17:23 -0500 (Sat, 22 May 2010) | 1 line

  #8785: less confusing description of regex.find*.
........
  r81516 | andrew.kuchling | 2010-05-25 08:34:08 -0500 (Tue, 25 May 2010) | 1 line

  Add three items
........
  r81562 | andrew.kuchling | 2010-05-27 08:22:53 -0500 (Thu, 27 May 2010) | 1 line

  Rewrite wxWidgets section
........
  r81563 | andrew.kuchling | 2010-05-27 08:30:09 -0500 (Thu, 27 May 2010) | 1 line

  Remove top-level 'General Questions' section, pushing up the questions it contains
........
  r81567 | andrew.kuchling | 2010-05-27 16:29:59 -0500 (Thu, 27 May 2010) | 1 line

  Add item
........
  r81593 | georg.brandl | 2010-05-29 03:46:18 -0500 (Sat, 29 May 2010) | 1 line

  #8616: add new turtle demo "nim".
........
  r81635 | georg.brandl | 2010-06-01 02:25:23 -0500 (Tue, 01 Jun 2010) | 1 line

  Put docs for RegexObject.search() before RegexObject.match() to mirror re.search() and re.match() order.
........
  r81680 | vinay.sajip | 2010-06-03 17:34:42 -0500 (Thu, 03 Jun 2010) | 1 line

  Issue #8890: Documentation changed to avoid reference to temporary files.
........
  r81681 | sean.reifschneider | 2010-06-03 20:51:26 -0500 (Thu, 03 Jun 2010) | 2 lines

  Issue8810: Clearing up docstring for tzinfo.utcoffset.
........
  r81684 | vinay.sajip | 2010-06-04 08:41:02 -0500 (Fri, 04 Jun 2010) | 1 line

  Issue #8890: Documentation changed to avoid reference to temporary files - other cases covered.
........
  r81801 | andrew.kuchling | 2010-06-07 08:38:40 -0500 (Mon, 07 Jun 2010) | 1 line

  #8875: Remove duplicated paragraph
........
  r81888 | andrew.kuchling | 2010-06-10 20:54:58 -0500 (Thu, 10 Jun 2010) | 1 line

  Add a few more items
........
  r81931 | georg.brandl | 2010-06-12 01:26:54 -0500 (Sat, 12 Jun 2010) | 1 line

  Fix punctuation.
........
  r81932 | georg.brandl | 2010-06-12 01:28:58 -0500 (Sat, 12 Jun 2010) | 1 line

  Document that an existing directory raises in mkdir().
........
  r81933 | georg.brandl | 2010-06-12 01:45:33 -0500 (Sat, 12 Jun 2010) | 1 line

  Update version in README.
........
  r81939 | georg.brandl | 2010-06-12 04:45:01 -0500 (Sat, 12 Jun 2010) | 1 line

  Use newer toctree syntax.
........
  r81940 | georg.brandl | 2010-06-12 04:45:28 -0500 (Sat, 12 Jun 2010) | 1 line

  Add document on how to build.
........
  r81941 | georg.brandl | 2010-06-12 04:45:58 -0500 (Sat, 12 Jun 2010) | 1 line

  Fix gratuitous indentation.
........
  r81942 | georg.brandl | 2010-06-12 04:46:03 -0500 (Sat, 12 Jun 2010) | 1 line

  Update README.
........
  r81963 | andrew.kuchling | 2010-06-12 15:00:55 -0500 (Sat, 12 Jun 2010) | 1 line

  Grammar fix
........
  r81984 | georg.brandl | 2010-06-14 10:58:39 -0500 (Mon, 14 Jun 2010) | 1 line

  #8993: fix reference.
........
  r81991 | andrew.kuchling | 2010-06-14 19:38:58 -0500 (Mon, 14 Jun 2010) | 1 line

  Add another bunch of items
........
  r82120 | andrew.kuchling | 2010-06-20 16:45:45 -0500 (Sun, 20 Jun 2010) | 1 line

  Note that Python 3.x isn't covered; add forward ref. for UTF-8; note error in 2.5 and up
........
  r82188 | benjamin.peterson | 2010-06-23 19:02:46 -0500 (Wed, 23 Jun 2010) | 1 line

  remove reverted changed
........
  r82264 | georg.brandl | 2010-06-27 05:47:47 -0500 (Sun, 27 Jun 2010) | 1 line

  Confusing punctuation.
........
  r82265 | georg.brandl | 2010-06-27 05:49:23 -0500 (Sun, 27 Jun 2010) | 1 line

  Use designated syntax for optional grammar element.
........
  r82266 | georg.brandl | 2010-06-27 05:51:44 -0500 (Sun, 27 Jun 2010) | 1 line

  Fix URL.
........
  r82267 | georg.brandl | 2010-06-27 05:55:38 -0500 (Sun, 27 Jun 2010) | 1 line

  Two typos.
........
This commit is contained in:
Benjamin Peterson 2010-06-27 22:32:30 +00:00
parent 30b7a90033
commit d7c3ed54ef
28 changed files with 1767 additions and 471 deletions

View file

@ -4,10 +4,12 @@
Unicode HOWTO
*****************
:Release: 1.1
:Release: 1.11
This HOWTO discusses Python's support for Unicode, and explains various problems
that people commonly encounter when trying to work with Unicode.
This HOWTO discusses Python 2.x's support for Unicode, and explains
various problems that people commonly encounter when trying to work
with Unicode. (This HOWTO has not yet been updated to cover the 3.x
versions of Python.)
Introduction to Unicode
@ -146,8 +148,9 @@ problems.
4. Many Internet standards are defined in terms of textual data, and can't
handle content with embedded zero bytes.
Generally people don't use this encoding, instead choosing other encodings that
are more efficient and convenient.
Generally people don't use this encoding, instead choosing other
encodings that are more efficient and convenient. UTF-8 is probably
the most commonly supported encoding; it will be discussed below.
Encodings don't have to handle every possible Unicode character, and most
encodings don't. The rules for converting a Unicode string into the ASCII
@ -223,8 +226,8 @@ Wikipedia entries are often helpful; see the entries for "character encoding"
<http://en.wikipedia.org/wiki/UTF-8>, for example.
Python's Unicode Support
========================
Python 2.x's Unicode Support
============================
Now that you've learned the rudiments of Unicode, we can look at Python's
Unicode features.
@ -266,8 +269,8 @@ Unicode result). The following examples show the differences::
>>> b'\x80abc'.decode("utf-8", "ignore")
'abc'
Encodings are specified as strings containing the encoding's name. Python comes
with roughly 100 different encodings; see the Python Library Reference at
Encodings are specified as strings containing the encoding's name. Python 3.2
comes with roughly 100 different encodings; see the Python Library Reference at
:ref:`standard-encodings` for a list. Some encodings have multiple names; for
example, 'latin-1', 'iso_8859_1' and '8859' are all synonyms for the same
encoding.
@ -626,7 +629,10 @@ Version 1.02: posted August 16 2005. Corrects factual errors.
Version 1.1: Feb-Nov 2008. Updates the document with respect to Python 3 changes.
Version 1.11: posted June 20 2010. Notes that Python 3.x is not covered,
and that the HOWTO only covers 2.x.
.. comment Describe Python 3.x support (new section? new document?)
.. comment Additional topic: building Python w/ UCS2 or UCS4 support
.. comment Describe use of codecs.StreamRecoder and StreamReaderWriter