This patch finalizes the move from UTF-8 to a default encoding in

the Python Unicode implementation.

The internal buffer used for implementing the buffer protocol
is renamed to defenc to make this change visible. It now holds the
default encoded version of the Unicode object and is calculated
on demand (NULL otherwise).

Since the default encoding defaults to ASCII, this will mean that
Unicode objects which hold non-ASCII characters will no longer
work on C APIs using the "s" or "t" parser markers. C APIs must now
explicitly provide Unicode support via the "u", "U" or "es"/"es#"
parser markers in order to work with non-ASCII Unicode strings.

(Note: this patch will also have to be applied to the 1.6 branch
 of the CVS tree.)
This commit is contained in:
Marc-André Lemburg 2000-08-03 18:46:08 +00:00
parent 2b83b4601f
commit bff879cabb
4 changed files with 109 additions and 65 deletions

View file

@ -204,8 +204,9 @@ typedef struct {
int length; /* Length of raw Unicode data in buffer */
Py_UNICODE *str; /* Raw Unicode buffer */
long hash; /* Hash value; -1 if not set */
PyObject *utf8str; /* UTF-8 encoded version as Python string,
or NULL */
PyObject *defenc; /* (Default) Encoded version as Python
string, or NULL; this is used for
implementing the buffer protocol */
} PyUnicodeObject;
extern DL_IMPORT(PyTypeObject) PyUnicode_Type;