[3.12] gh-106931: Intern Statically Allocated Strings Globally (gh-107272) (gh-110713)

We tried this before with a dict and for all interned strings.  That ran into problems due to interpreter isolation.  However, exclusively using a per-interpreter cache caused some inconsistency that can eliminate the benefit of interning.  Here we circle back to using a global cache, but only for statically allocated strings.  We also use a more-basic _Py_hashtable_t for that global cache instead of a dict.

Ideally we would only have the global cache, but the optional isolation of each interpreter's allocator means that a non-static string object must not outlive its interpreter.  Thus we would have to store a copy of each such interned string in the global cache, tied to the main interpreter.

(cherry-picked from commit b72947a8d2)
This commit is contained in:
Eric Snow 2023-11-27 16:51:12 -07:00 committed by GitHub
parent 60a08e6ff2
commit 4f71f1680d
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
11 changed files with 4324 additions and 4186 deletions

View file

@ -208,6 +208,7 @@ class Printer:
self.write(".kind = 1,")
self.write(".compact = 1,")
self.write(".ascii = 1,")
self.write(".statically_allocated = 1,")
self.write(f"._data = {make_string_literal(s.encode('ascii'))},")
return f"& {name}._ascii.ob_base"
else:
@ -220,6 +221,7 @@ class Printer:
self.write(f".kind = {kind},")
self.write(".compact = 1,")
self.write(".ascii = 0,")
self.write(".statically_allocated = 1,")
utf8 = s.encode('utf-8')
self.write(f'.utf8 = {make_string_literal(utf8)},')
self.write(f'.utf8_length = {len(utf8)},')