Hacker News new | ask | show | jobs
by bjoli 2609 days ago
I'd say it's the other way around: you use whatever your language provides as default (which is probably something like Timsort) until it becomes a bottleneck and then you analyse the data and write something that is specific to your use case that will blow Timsort out of the water.

Timsort is your first choice.

1 comments

In practice this boils down to, "you use whatever your language provides".

If sorting became a bottleneck, I'd literally look at every possible approach to speed it up that I could find before considering improving the sorting algorithm. Starting with trying to figure out how to throw around less data while trying to solve the problem.

As someone who has spend a great deal of time on sorting algorithms when I was in academia ( even published one) I completely agree with this sentiment. Sorting algorithms are tricky to get right and there is a lot of edge cases, so even when you (think) you know what you are doing you still get it wrong. Been there many times and it always causes some frustration.

Do you really want your business critical sorting rely on something that only might work?

Sorting algorithms are like crypto, don't roll your own if you can avoid it.

For a problem i faced I ended up being able to produce very small chunks of already sorted data and using the merge part of the stable sort algorithm (it was provided).

Not only that, I ended up being able to do it lazily making the user experience much better.

And I have been forced to write sorting algorithms from scratch as well. But it was truly a last resort. In my case I had to work with data that literally did not fit in uncompressed form on the computer that I had to work with. So I had to write a disk sort where data as kept in compressed form. (I used merge sort for this.)

However the fact that sometimes we are forced to our last resort doesn't change the fact that we should be aware that it really is a last resort.