Commit graph

118 commits

Author SHA1 Message Date
Christian Heimes
082c9b0267 Fixed bug #1915: Python compiles with --enable-unicode=no again. However several extension methods and modules do not work without unicode support. 2008-01-23 14:20:50 +00:00
Amaury Forgeot d'Arc
5087980c1e The incremental decoder for utf-7 must preserve its state between calls.
Solves issue1460.

Might not be a backport candidate: a new API function was added,
and some code may rely on details in utf-7.py.
2007-11-20 23:31:27 +00:00
Walter Dörwald
183744d6b9 Fix for #1444: utf_8_sig.StreamReader was (indirectly through decode())
calling codecs.utf_8_decode() with final==True, which falled with incomplete
byte sequences. Fix and test by James G. Sack.
2007-11-19 12:41:10 +00:00
Georg Brandl
4cdceac760 Fix #883466: don't allow Unicode as arguments to quopri and uu codecs. 2007-09-03 07:16:46 +00:00
Walter Dörwald
6e39080649 Backport r57105 and r57145 from the py3k branch: UTF-32 codecs. 2007-08-17 16:41:28 +00:00
Walter Dörwald
4234827e99 Fix utf-8-sig incremental decoder, which didn't recognise a BOM when the
first chunk fed to the decoder started with a BOM, but was longer than 3 bytes.
2007-04-12 10:35:00 +00:00
Brett Cannon
fa6521b4fd Make the __import__ call in encodings.__init__ absolute with a level 0 call. 2007-02-16 19:33:01 +00:00
Brett Cannon
971a012ce1 Update the encoding package's search function to use absolute imports when
calling __import__.  This helps make the expected search locations for encoding
modules be more explicit.

One could use an explicit value for __path__ when making the call to __import__
to force the exact location searched for encodings.  This would give the most
strict search path possible if one is worried about malicious code being
imported.  The unfortunate side-effect of that is that if __path__ was modified
on 'encodings' on purpose in a safe way it would not be picked up in future
__import__ calls.
2007-02-15 22:54:39 +00:00
Georg Brandl
4ba9e5bdc7 Patch #1634778: add missing encoding aliases for iso8859_15 and
iso8859_16.
2007-01-27 17:59:42 +00:00
Walter Dörwald
39b8b6afb5 Change decode() so that it works with a buffer (i.e. unicode(..., 'utf-8-sig'))
SF bug #1601501.
2006-11-23 05:03:56 +00:00
Georg Brandl
2c9838e30f Bug #1586613: fix zlib and bz2 codecs' incremental en/decoders. 2006-10-29 14:39:09 +00:00
Georg Brandl
a92979a1db Bug #1446043: correctly raise a LookupError if an encoding name given
to encodings.search_function() contains a dot.
2006-09-30 11:22:28 +00:00
Neal Norwitz
391e5f4c9f importing types is not necessary if we use isinstance 2006-08-25 01:52:49 +00:00
Martin v. Löwis
961b91bd3c Correction of patch #1455898: In the mbcs decoder, set final=False
for stream decoder, but final=True for the decode function.
2006-08-02 13:53:55 +00:00
Martin v. Löwis
0eac11826a Make import/lookup of mbcs fail on non-Windows systems. 2006-06-15 06:45:05 +00:00
Martin v. Löwis
d825143be1 Patch #1455898: Incremental mode for "mbcs" codec. 2006-06-14 05:21:04 +00:00
Walter Dörwald
c6f5b3ad6c errors is an attribute in the incremental decoder
not an argument.
2006-06-13 12:04:43 +00:00
Walter Dörwald
6b6e2bb8b1 Fix passing errors to the encoder and decoder functions. 2006-06-13 12:02:12 +00:00
Tim Peters
c7d14452a4 Whitespace normalization. 2006-06-04 23:43:53 +00:00
Martin v. Löwis
3f767795f6 Patch #1359618: Speed-up charmap encoder. 2006-06-04 19:36:28 +00:00
Walter Dörwald
78a0be6ab3 Add a BufferedIncrementalEncoder class that can be used for implementing
an incremental encoder that must retain part of the data between calls
to the encode() method.

Fix the incremental encoder and decoder for the IDNA encoding.

This closes SF patch #1453235.
2006-04-14 18:25:39 +00:00
Walter Dörwald
a40cf31de6 Make error message less misleading for u"a..b".encode("idna"). 2006-04-14 17:00:36 +00:00
Walter Dörwald
6493699c0d Make raise statements PEP 8 compatible. 2006-04-14 15:22:27 +00:00
Walter Dörwald
a8da934069 Whitespace. 2006-03-27 09:02:04 +00:00
Hye-Shik Chang
e2ac4abd01 Patch #1443155: Add the incremental codecs support for CJK codecs.
(reviewed by Walter Dörwald)
2006-03-26 02:34:59 +00:00
Guido van Rossum
f8480a7856 Instead of relative imports, use (implicitly) absolute ones. 2006-03-15 23:08:13 +00:00
Tim Peters
f99b8162a2 Whitespace normalization. 2006-03-15 18:08:37 +00:00
Walter Dörwald
13ed60b504 Fix typo. 2006-03-15 13:36:50 +00:00
Walter Dörwald
abb02e5994 Patch #1436130: codecs.lookup() now returns a CodecInfo object (a subclass
of tuple) that provides incremental decoders and encoders (a way to use
stateful codecs without the stream API). Functions
codecs.getincrementaldecoder() and codecs.getincrementalencoder() have
been added.
2006-03-15 11:35:15 +00:00
Guido van Rossum
87de069e4e Use relative imports in a few places where I noticed the need.
(Ideally, all packages in Python 2.5 will use the relative import
syntax for all their relative import needs.)
2006-03-15 04:33:54 +00:00
Martin v. Löwis
5bd7c02298 Avoid forward-declaring the methods array.
Rename unicodedata.db* to unicodedata.ucd*
2006-03-10 11:20:04 +00:00
Martin v. Löwis
480f1bb67b Update Unicode database to Unicode 4.1. 2006-03-09 23:38:20 +00:00
Marc-André Lemburg
fe4b34cc4b Fix the encodings package codec search function to only search
inside its own package. Fixes problem reported in patch #1433198.

Add codec search function for codec test codec.
2006-02-19 15:22:22 +00:00
Martin v. Löwis
412ed3b8a7 Patch #1177307: UTF-8-Sig codec. 2006-01-08 10:45:39 +00:00
Tim Peters
536cf99536 Whitespace normalization. 2005-12-25 23:18:31 +00:00
Marc-André Lemburg
d9cf593b49 Cosmetic change: make all hex literals use upper case hex so that they
look more like the Unicode Consortium files.

Add ending new-line to all source files.
2005-10-24 12:14:59 +00:00
Marc-André Lemburg
3c72ded23d Removed the decoding_map from the codecs where this is possible.
Replaced the tis_620, cp1140 and koi8_u codecs with new ones
based on custom mapping files.
2005-10-24 12:07:49 +00:00
Marc-André Lemburg
0f00ba8bd8 Replace the old EBCDIC codecs with new ones using the decoding table. 2005-10-21 14:35:35 +00:00
Marc-André Lemburg
7797be7b3b Alias iso8859_1 to latin_1 which is the same encoding, but has
a much faster codec implementation.
2005-10-21 14:02:28 +00:00
Marc-André Lemburg
75c9e8392e Add a few more Mac OS encodings. The mapping tables for these are
available at ftp.unicode.org.
2005-10-21 13:58:32 +00:00
Marc-André Lemburg
a1129f4b9b Replace the old charmap codecs with new ones generated from the current
mapping tables available at ftp.unicode.org.

These new codecs include and use character decoding tables which speeds
up decoding by a few factors.
2005-10-21 13:49:12 +00:00
Walter Dörwald
007f8dfde2 Bug #1245379: Add "unicode-1-1-utf-7" as an alias for "utf-7" as specified
by RFC 1642.
2005-10-09 19:42:27 +00:00
Neal Norwitz
4ce69a5b06 No need to import exceptions, they are builtins 2005-09-01 00:45:28 +00:00
Martin v. Löwis
8b59514e57 Make IDNA return an empty string when the input is empty. Fixes #1163178.
Will backport to 2.4.
2005-08-25 11:03:38 +00:00
Walter Dörwald
729c31f5c3 Reset internal buffers when seek() is called. This fixes SF bug #1156259. 2005-03-14 19:06:30 +00:00
Walter Dörwald
e1a0391b49 Fix wrong variable name. 2004-12-29 13:11:10 +00:00
Marc-André Lemburg
9ab8818c87 Rearranged mappings to value sorting order. 2004-12-10 21:54:35 +00:00
Walter Dörwald
69652035bc SF patch #998993: The UTF-8 and the UTF-16 stateful decoders now support
decoding incomplete input (when the input stream is temporarily exhausted).
codecs.StreamReader now implements buffering, which enables proper
readline support for the UTF-16 decoders. codecs.StreamReader.read()
has a new argument chars which specifies the number of characters to
return. codecs.StreamReader.readline() and codecs.StreamReader.readlines()
have a new argument keepends. Trailing "\n"s will be stripped from the lines
if keepends is false. Added C APIs PyUnicode_DecodeUTF8Stateful and
PyUnicode_DecodeUTF16Stateful.
2004-09-07 20:24:22 +00:00
Tim Peters
d1b7827216 Whitespace normalization. 2004-08-07 06:03:09 +00:00
Marc-André Lemburg
c759f070ef Added new codecs and aliases for ISO_8859-11, ISO_8859-16 and
TIS-620.

Closes SF bug #1001895: Adding missing ISO 8859 codecs, especially Thai.
2004-08-05 12:43:30 +00:00