|
|
|
|
|
by kragen
5604 days ago
|
|
Your #2 suggests that it would be slower than ordinary Quicksort on normal-sized datasets. But dual-pivot quicksort is faster than ordinary Quicksort on normal-sized datasets. So in at least one sense, dual-pivot is not a step in the direction of samplesort. Python's built-in sort is timsort, last I heard, which is a variant of mergesort that looks for existing sorted or backwards runs in order to run faster in the common cases of mostly-sorted or mostly-backwards data. It's true that Python's built-in sort was a samplesort variant until 2002, though, which I didn't know. (It does seem like Samplesort's virtue of minimizing comparisons would be particularly desirable in an environment like Python, where every comparison potentially involves a call into slow interpreted code.) |
|
- From an input sequence of length n, choose a random subsequence of length k.
- Sort the subsequence.
- Partition the remaining (n-k) elements of the original sequence into the (k+1) partitions given by the subsequence.
- Sort the (k+1) partitions.
In the paper, the partitioning and sorting is always done using quicksort. Any n*log(n) sorting algorithm should work though, which includes using samplesort recursively.
If you use an optimal value for k (and this might be a significant fraction of n), you can prove that lim_{n->infty} E(C_n)/log(n!) = 1, where E(C_n) is the expected number of comparisons for a sequence of length n.
Now, using samplesort recursively, quicksort is samplesort with k = 1. Double-Pivot quicksort is samplesort with k = 2.