Hacker News new | ask | show | jobs
by GuB-42 1293 days ago
> is it theoretically possible for a sorting algorithm to achieve sub-O(nlog(n)) speeds on 99.99% (or some other %) of randomly selected inputs? Or even O(n)?

No, you can get better than nlog(n) for specific cases that may happen a lot in practice, but not in the general case of randomly selected continuous inputs.

The explanation comes from information theory, and it is the same idea as for why you can't compress random data.

In essence, a sorting algorithm has to pick the correct reordering of the sequence, there are n! possible reorderings, so the algorithm needs log(n!) bits of information about the sequence, each comparison yields one bit, and log(n!) is asymptotically equivalent to nlog(n), so you need at least nlog(n) comparisons to cover every case. In practice, some reorderings are more common than others, in particular the "already sorted" case, so it is worth making your algorithm check these cases first, but on randomly selected inputs, these special cases represent a negligible fraction of all the possible cases.

1 comments

> you can get better than nlog(n) for specific cases that may happen a lot in practice, but not in the general case of randomly selected continuous inputs.

I guess what I'm asking is how "specific" do these cases have to be, or how "general" is the general case? Can specific cases be 0.0001%? How about 1%?