Hacker News new | ask | show | jobs
by gameguy43 2613 days ago
Original author here, happy to answer questions about data structures, algorithms, and coding interviews!
3 comments

It might be notable that on very small lists (up to around 50, maybe 100 depending on cpu/cache) insertion sort is fastest of all, even on difficult input distributions. Its so simple that cpu can skip through it rapidly. I noticed this myself and have also read other developers mention it in sort design discussions. A popular general purpose sort called 'Timsort' defaults to it for small sequences. Its possible to tweak the insert algorithm to sum distance of elements moved so far and escape to a more substantial algorithm when it gets excessive. If little rearrangement is required insertion sort is about as fast as scanning through memory can be.

I think in practice the answer to the question "which should I use" (rather than "which should ideally be used") is the best developed hybrid sort library for the platform. Beyond the suitable case for insertion sort, its a really substantial job for an individual to develop, test and optimize a high performance sort library that can handle a range of distribution types.

One of the big factors behind this is that the cost of pipeline stalls in a theoretically good algorithm can significantly exceed the cost of extra comparisons.

Whether that is true depends on a combination of how much data you have, what kind of data you have, and your programming language's low-level representation.

For ints in C++ and a list of 20 elements, it will be reliably true. For strings in Perl with a list of 500 elements, it reliably won't be true.

LSB radix sort can be slow, but MSB radix sort is the fastest sorting algorithm for an array of machine words.
"No CS degree necessary." lured me in. I'm subscribed and looking forward to learning more.