Elsewhere in the setobject.c code we do a bitwise-and with the mask
instead of using a conditional to reset to zero on wrap-around.
Using that same technique here use gives cleaner, faster, and more
consistent code.
* Move the test for an exact key match to after a hash match
* Use "used" as a loop counter instead of "fill"
* Minor improvements to variable names and code consistency
The setobject freelist was consuming memory but not providing much value.
Even when a freelisted setobject was available, most of the setobject
fields still needed to be initialized and the small table still required
a memset(). This meant that the custom freelisting scheme for sets was
providing almost no incremental benefit over the default Python freelist
scheme used by _PyObject_Malloc() in Objects/obmalloc.c.
Modern processors tend to make consecutive memory accesses cheaper than
random probes into memory.
Small sets can fit into L1 cache, so they get less benefit. But they do
come out ahead because the consecutive probes don't probe the same key
more than once and because the randomization step occurs less frequently
(or not at all).
For the open addressing step, putting the perturb shift before the index
calculation gets the upper bits into play sooner.
The Gdb prettyprint plugin depended on the dummy object being displayable.
Other solutions besides a unicode object are possible. For now, get it
back up and running.
The identity checks in lookkey() need to be there to prevent the dummy
object from leaking through Py_RichCompareBool() into user code in the
rare circumstance where the dummy's hash value exactly matches the hash
value of the actual key being looked up.
Letting the compiler decide how to optimize the multiply by five
gives it the freedom to make better choices for the best technique
for a given target machine.
For example, GCC on x86_64 produces a little bit better code:
Old-way (3 steps with a data dependency between each step):
shrq $5, %r13
leaq 1(%rbx,%r13), %rax
leaq (%rax,%rbx,4), %rbx
New-way (3 steps with no dependency between the first two steps
which can be run in parallel):
leaq (%rbx,%rbx,4), %rax # i*5
shrq $5, %r13 # perturb >>= PERTURB_SHIFT
leaq 1(%r13,%rax), %rbx # 1 + perturb + i*5