array. Our samplesort special-cases the snot out of this, running about
12x faster than *sort. The experimental mergesort runs it about 8x
faster than *sort without special-casing, but should really do better
than that (when merging runs of different lengths, right now it only
does something clever about finding where the second run begins in
the first and where the first run ends in the second, and that's more
of a temp-memory optimization).
the default range to end at 2**20 (machines are much faster now).
Fixed what was quite a arguably a bug, explaining an old mystery: the
"!sort" case here contructs what *was* a quadratic-time disaster for
the old quicksort implementation. But under the current samplesort, it
always ran much faster than *sort (the random case). This never made
sense. Turns out it was because !sort was sorting an integer array,
while all the other cases sort floats; and comparing ints goes much
quicker than comparing floats in Python. After changing !sort to chew
on floats instead, it's now slower than the random sort case, which
makes more sense (but is just a few percent slower; samplesort is
massively less sensitive to "bad patterns" than quicksort).