Marc-Andre's third try at this bulk patch seems to work (except that

his copy of test_contains.py seems to be broken -- the lines he deleted were already absent). Checkin messages: New Unicode support for int(), float(), complex() and long(). - new APIs PyInt_FromUnicode() and PyLong_FromUnicode() - added support for Unicode to PyFloat_FromString() - new encoding API PyUnicode_EncodeDecimal() which converts Unicode to a decimal char* string (used in the above new APIs) - shortcuts for calls like int(<int object>) and float(<float obj>) - tests for all of the above Unicode compares and contains checks: - comparing Unicode and non-string types now works; TypeErrors are masked, all other errors such as ValueError during Unicode coercion are passed through (note that PyUnicode_Compare does not implement the masking -- PyObject_Compare does this) - contains now works for non-string types too; TypeErrors are masked and 0 returned; all other errors are passed through Better testing support for the standard codecs. Misc minor enhancements, such as an alias dbcs for the mbcs codec. Changes: - PyLong_FromString() now applies the same error checks as does PyInt_FromString(): trailing garbage is reported as error and not longer silently ignored. The only characters which may be trailing the digits are 'L' and 'l' -- these are still silently ignored. - string.ato?() now directly interface to int(), long() and float(). The error strings are now a little different, but the type still remains the same. These functions are now ready to get declared obsolete ;-) - PyNumber_Int() now also does a check for embedded NULL chars in the input string; PyNumber_Long() already did this (and still does) Followed by: Looks like I've gone a step too far there... (and test_contains.py seem to have a bug too). I've changed back to reporting all errors in PyUnicode_Contains() and added a few more test cases to test_contains.py (plus corrected the join() NameError).
2025-12-09 10:37:17 +00:00 · 2000-04-05 20:11:21 +00:00 · 2000-04-05 20:11:21 +00:00 · 9e896b37c7
commit 9e896b37c7
parent 457855a5f0
17 changed files with 421 additions and 115 deletions
--- a/Objects/longobject.c
+++ b/Objects/longobject.c
@ -724,7 +724,7 @@ PyLong_FromString(str, pend, base)
 	int base;
 {
 	int sign = 1;
-	char *start;
+	char *start, *orig_str = str;
 	PyLongObject *z;
 	
 	if ((base != 0 && base < 2) || base > 36) {
@ -772,17 +772,44 @@ PyLong_FromString(str, pend, base)
 	}
 	if (z == NULL)
 		return NULL;
-	if (str == start) {
-		PyErr_SetString(PyExc_ValueError,
-				"no digits in long int constant");
-		Py_DECREF(z);
-		return NULL;
-	}
+	if (str == start)
+		goto onError;
 	if (sign < 0 && z != NULL && z->ob_size != 0)
 		z->ob_size = -(z->ob_size);
+	if (*str == 'L' || *str == 'l')
+		str++;
+	while (*str && isspace(Py_CHARMASK(*str)))
+		str++;
+	if (*str != '\0')
+		goto onError;
 	if (pend)
 		*pend = str;
 	return (PyObject *) z;
+
+ onError:
+	PyErr_Format(PyExc_ValueError, 
+		     "invalid literal for long(): %.200s", orig_str);
+	Py_XDECREF(z);
+	return NULL;
+}
+
+PyObject *
+PyLong_FromUnicode(u, length, base)
+	Py_UNICODE *u;
+	int length;
+	int base;
+{
+	char buffer[256];
+
+	if (length >= sizeof(buffer)) {
+		PyErr_SetString(PyExc_ValueError,
+				"long() literal too large to convert");
+		return NULL;
+	}
+	if (PyUnicode_EncodeDecimal(u, length, buffer, NULL))
+		return NULL;
+
+	return PyLong_FromString(buffer, NULL, base);
 }

 static PyLongObject *x_divrem