unicode database compression, step 2:

- fixed attributions - moved decomposition data to a separate table, in preparation for step 3 (which won't happen before 2.0 final, promise!) - use relative paths in the generator script I have a lot more stuff in the works for 2.1, but let's leave that for another day...
2025-11-25 04:34:37 +00:00 · 2000-09-25 08:07:06 +00:00 · 2000-09-25 08:07:06 +00:00 · cfcea49218
commit cfcea49218
parent 2101348830
5 changed files with 4613 additions and 4330 deletions
--- a/Modules/unicodedatabase.c
+++ b/Modules/unicodedatabase.c
@ -4,9 +4,10 @@

   Data was extracted from the Unicode 3.0 UnicodeData.txt file.

-Written by Marc-Andre Lemburg (mal@lemburg.com).
+   Written by Marc-Andre Lemburg (mal@lemburg.com).
+   Rewritten for Python 2.0 by Fredrik Lundh (fredrik@pythonware.com)

-Copyright (c) Corporation for National Research Initiatives.
+   Copyright (c) Corporation for National Research Initiatives.

   ------------------------------------------------------------------------ */

@ -29,3 +30,18 @@ _PyUnicode_Database_GetRecord(int code)
    }
    return &_PyUnicode_Database_Records[index];
 }
+
+const char *
+_PyUnicode_Database_GetDecomposition(int code)
+{
+    int index;
+
+    if (code < 0 || code >= 65536)
+        index = 0;
+    else {
+        index = decomp_index1[(code>>DECOMP_SHIFT)];
+        index = decomp_index2[(index<<DECOMP_SHIFT)+
+                             (code&((1<<DECOMP_SHIFT)-1))];
+    }
+    return decomp_data[index];
+}