Hacker News new | ask | show | jobs
by graycat 3281 days ago
I always wondered if there would be a way to have quicksort run slower than O(n ln(n)).

Due to that possibility, when I code up a sort routine, I use heap sort. It is guaranteed O(n ln(n)) worst case and achieves the Gleason bound for sorting by comparing keys which means that on average and worst case, on the number of key comparisons, it is impossible to do better than heap sort's O(n ln(n)) forever.

For a stable sort, sure, just extend the sort keys with a sequence number, do the sort, and remove the key extensions.

Quicksort has good main memory locality of reference and a possibility of some use of multiple threads, and heap sort seems to have neither. But there is a version of heap sort modified for doing better on locality of reference when the array being sorted is really large.

But, if are not too concerned about memory space, then don't have to care about the sort routine being in place. In that case, get O(n ln(n)), a stable sort, no problems with locality of reference, and ability to sort huge arrays with just the old merge sort.

I long suspected that much of the interest in in-place, O(n ln(n)), stable sorting was due to some unspoken but strong goal of finding some fundamental conservation law of a trade off of processor time and memory space. Well, that didn't really happen. But heap sort is darned clever; I like it.

2 comments

> I long suspected that much of the interest in in-place, O(n ln(n)), stable sorting was due to some unspoken but strong goal of finding some fundamental conservation law of a trade off of processor time and memory space. Well, that didn't really happen. But heap sort is darned clever; I like it.

It did happen, it's called block sort, but it's not used because of the complexity of implementing it, the constants of its runtime, it's not easy to parallelize, and its best case complexity is still O(n ln n).

It's cool to play with a pack of cards and run sorting algorithms on them. To see the worst case of quicksort, use the first element as the pivot and give it an already sorted list. It will take quadratic time to give back the same list.
Right. So, for the first "pivot" value, people commonly use the median of three -- take three keys and use as the pivot the median of those three, that is, the middle value. Okay. But then the question remains: While in practice the median of three sounds better, maybe there is a goofy, pathological array of keys that still makes quicksort run in quadratic time. Indeed, maybe for any way of selecting the first pivot, there is an array that makes quicksort quadratic.

Rather than think about that, I noticed that heap sort meets the Gleason bound which means that heap sort's O(n ln)n)) performance both worst case and average case can never be beaten by a sort routine that depends on comparing keys two at a time.

Then, sure, can beat O(n ln(n)). How? Use radix sort -- that was how the old punched card sorting machines worked. So, for an array of length n and a key of length k, the thing always runs in O(nk) which for sufficiently large n is less than O(n ln(n)). In practice? Nope: I don't use radix sort!