| I think it's fair to say that pdqsort (pattern-defeating quicksort) is overall the best unstable sort and timsort is overall the best stable sort in 2017, at least if you're implementing one for a standard library. The standard sort algorithm in Rust is timsort[1] (slice::sort), but soon we'll have pdqsort as well[2] (slice::sort_unstable), which shows great benchmark numbers.[3] Actually, I should mention that both implementations are not 100% equivalent to what is typically considered as timsort and pdqsort, but they're pretty close. It is notable that Rust is the first programming language to adopt pdqsort, and I believe its adoption will only grow in the future. Here's a fun fact: Typical quicksorts (and introsorts) in standard libraries spend most of the time doing literally nothing - just waiting for the next instruction because of failed branch prediction! If you manage to eliminate branch misprediction, you can easily make sorting twice as fast! At least that is the case if you're sorting items by an integer key, or a tuple of integers, or something primitive like that (i.e. when comparison is rather cheap). Pdqsort efficiently eliminates branch mispredictions and brings some other improvements over introsort as well - for example, the complexity becomes O(nk) if the input array is of length n and consists of only k different values. Of course, worst-case complexity is always O(n log n). Finally, last week I implemented parallel sorts for Rayon (Rust's data parallelism library) based on timsort and pdqsort[4]. Check out the links for more information and benchmarks. And before you start criticizing the benchmarks, please keep in mind that they're rather simplistic, so please take them with a grain of salt. I'd be happy to elaborate further and answer any questions. :) [1] https://github.com/rust-lang/rust/pull/38192 [2] https://github.com/rust-lang/rust/issues/40585 [3] https://github.com/rust-lang/rust/pull/40601 [4] https://github.com/nikomatsakis/rayon/pull/379 |
For this demo I'm sorting a permutation of the range 1 to n, so obtaining perfect or arbitrarily skewed pivots is free, which isn't realistic, but that's not the point. My experiments show that quicksort is fastest on my machine when 15% of the elements are on one side and 85% on the other. This is a tradeoff between branch prediction (if it always guesses "right side", it's right 85% of the time) and recursion depth (which grows as the pivot becomes more and more imbalanced).
Of course, skewing the pivot is not what you want to do. You should rather look at sorting algorithms which don't suffer from branch mispredictions (as much), such as pdqsort, super-scalar sample sort [1,2] or block quicksort [3]
[1] excellent paper on a parallel in-place adaptation by some of my colleagues: https://arxiv.org/abs/1705.02257 - code will become available soon
[2] my less sophisticated implementation: https://github.com/lorenzhs/ssssort/, I also have a version using pdqsort as base case sorter, which is faster: https://github.com/lorenzhs/ssssort/blob/pdq/speed.pdf (in that plot, ssssort = super scalar sample sort with pdqsort as base case)
[3] https://arxiv.org/abs/1604.06697, implementation at https://github.com/weissan/BlockQuicksort - pdqsort uses the same technique