Make identifiers str (not str8) objects throughout.

This affects the parser, various object implementations,
and all places that put identifiers into C string literals.

In testing, a number of crashes occurred as code would
fail when the recursion limit was reached (such as the
Unicode interning dictionary having key/value pairs where
key is not value). To solve these, I added an overflowed
flag, which allows for 50 more recursions after the
limit was reached and the exception was raised, and
a recursion_critical flag, which indicates that recursion
absolutely must be allowed, i.e. that a certain call
must not cause a stack overflow exception.

There are still some places where both str and str8 are
accepted as identifiers; these should eventually be
removed.
This commit is contained in:
Martin v. Löwis 2007-06-10 09:51:05 +00:00
parent 38e43c25ee
commit 5b222135f8
40 changed files with 462 additions and 289 deletions

View file

@ -18,6 +18,17 @@
#include "abstract.h"
#endif /* PGEN */
#define is_potential_identifier_start(c) (\
(c >= 'a' && c <= 'z')\
|| (c >= 'A' && c <= 'Z')\
|| c == '_')
#define is_potential_identifier_char(c) (\
(c >= 'a' && c <= 'z')\
|| (c >= 'A' && c <= 'Z')\
|| (c >= '0' && c <= '9')\
|| c == '_')
extern char *PyOS_Readline(FILE *, FILE *, char *);
/* Return malloc'ed string including trailing \n;
empty malloc'ed string for EOF;
@ -1209,7 +1220,7 @@ tok_get(register struct tok_state *tok, char **p_start, char **p_end)
}
/* Identifier (most frequent token!) */
if (isalpha(c) || c == '_') {
if (is_potential_identifier_start(c)) {
/* Process r"", u"" and ur"" */
switch (c) {
case 'r':
@ -1227,7 +1238,7 @@ tok_get(register struct tok_state *tok, char **p_start, char **p_end)
goto letter_quote;
break;
}
while (isalnum(c) || c == '_') {
while (is_potential_identifier_char(c)) {
c = tok_nextc(tok);
}
tok_backup(tok, c);