Do not put a raw REPLACEMENT CHARACTER in the document.

This commit is contained in:
Georg Brandl 2010-11-19 22:09:04 +00:00
parent c5b0ec0a83
commit c8c60c2284

View file

@ -263,10 +263,13 @@ Unicode result). The following examples show the differences::
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0:
unexpected code byte unexpected code byte
>>> b'\x80abc'.decode("utf-8", "replace") >>> b'\x80abc'.decode("utf-8", "replace")
'<EFBFBD>abc' '?abc'
>>> b'\x80abc'.decode("utf-8", "ignore") >>> b'\x80abc'.decode("utf-8", "ignore")
'abc' 'abc'
(In this code example, the Unicode replacement character has been replaced by
a question mark because it may not be displayed on some systems.)
Encodings are specified as strings containing the encoding's name. Python 3.2 Encodings are specified as strings containing the encoding's name. Python 3.2
comes with roughly 100 different encodings; see the Python Library Reference at comes with roughly 100 different encodings; see the Python Library Reference at
:ref:`standard-encodings` for a list. Some encodings have multiple names; for :ref:`standard-encodings` for a list. Some encodings have multiple names; for