mirror of
https://github.com/python/cpython.git
synced 2025-07-15 23:35:23 +00:00
gh-93065: Fix HAMT to iterate correctly over 7-level deep trees (GH-93066) (GH-93146)
Also while there, clarify a few things about why we reduce the hash to 32 bits.
Co-authored-by: Eli Libman <eli@hyro.ai>
Co-authored-by: Yury Selivanov <yury@edgedb.com>
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
(cherry picked from commit c1f5c903a7
)
This commit is contained in:
parent
c1b12495f6
commit
a4bea26ee4
5 changed files with 65 additions and 4 deletions
|
@ -408,14 +408,22 @@ hamt_hash(PyObject *o)
|
|||
return -1;
|
||||
}
|
||||
|
||||
/* While it's suboptimal to reduce Python's 64 bit hash to
|
||||
/* While it's somewhat suboptimal to reduce Python's 64 bit hash to
|
||||
32 bits via XOR, it seems that the resulting hash function
|
||||
is good enough (this is also how Long type is hashed in Java.)
|
||||
Storing 10, 100, 1000 Python strings results in a relatively
|
||||
shallow and uniform tree structure.
|
||||
|
||||
Please don't change this hashing algorithm, as there are many
|
||||
tests that test some exact tree shape to cover all code paths.
|
||||
Also it's worth noting that it would be possible to adapt the tree
|
||||
structure to 64 bit hashes, but that would increase memory pressure
|
||||
and provide little to no performance benefits for collections with
|
||||
fewer than billions of key/value pairs.
|
||||
|
||||
Important: do not change this hash reducing function. There are many
|
||||
tests that need an exact tree shape to cover all code paths and
|
||||
we do that by specifying concrete values for test data's `__hash__`.
|
||||
If this function is changed most of the regression tests would
|
||||
become useless.
|
||||
*/
|
||||
int32_t xored = (int32_t)(hash & 0xffffffffl) ^ (int32_t)(hash >> 32);
|
||||
return xored == -1 ? -2 : xored;
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue