| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dpcx 3284 days ago
	Question as a non low-level developer, and please forgive my ignorance: How is it that we're essentially 50 years in to writing sorting algorithms, and we still find improvements? Shouldn't sorting items be a "solved" problem by now?

7 comments

_hrfd 3284 days ago

Basically all comparison-based sort algorithms we use today stem from two basic algorithms: mergesort (stable sort, from 1945) and quicksort (unstable sort, from 1959).

Mergesort was improved by Tim Peters in 2002 and that became timsort. He invented a way to take advantage of pre-sorted intervals in arrays to speed up sorting. It's basically an additional layer over mergesort with a few other low-level tricks to minimize the amount memcpying.

Quicksort was improved by David Musser in 1997 when he developed introsort. He set a strict worst-case bound of O(n log n) on the algorithm, as well as improved the pivot selection strategy. And people are inventing new ways of pivot selection all the time. E.g. Andrei Alexandrescu has published a new method in 2017[1].

In 2016 Edelkamp and Weiß found a way to eliminate branch mispredictions during the partitioning phase in quicksort/introsort. This is a vast improvement. The same year Orson Peters adopted this technique and developed pattern-defeating quicksort. He also figured out multiple ways to take advantage of partially sorted arrays.

Sorting is a mostly "solved" problem in theory, but as new hardware emerges different aspects of implementations become more or less important (cache, memory, branch prediction) and then we figure out new tricks to take advantage of modern hardware. And finally, multicore became a thing fairly recently so there's a push to explore sorting in yet another direction...

[1] http://erdani.com/research/sea2017.pdf

xenadu02 3283 days ago

It's always good to remember that while Big-O is useful, it isn't the be-all end-all. The canonical example on modern hardware is a linked list. In theory it has many great properties. In reality chasing pointers can be death due to cache misses.

Often a linear search of a "dumb" array can be the fastest way to accomplish something because it is very amenable to pre-fetching (it is obvious to the pre-fetcher what address will be needed next). Even a large array may fit entirely in L2 or L3. For small data structures arrays are almost always a win; in some cases even hashing is slower than a brute-force search of an array!

A good middle ground can be a binary tree with a bit less than an L1's worth of entries in an array stored at each node. The binary tree lets you skip around the array quickly while the CPU can zip through the elements at each node.

It is more important than ever to test your assumptions. Once you've done the Big-O analysis to eliminate exponential algorithms and other basic optimizations you need to analyze the actual on-chip performance, including cache behavior and branch prediction.

flukus 3283 days ago

> It's always good to remember that while Big-O is useful, it isn't the be-all end-all. The canonical example on modern hardware is a linked list. In theory it has many great properties. In reality chasing pointers can be death due to cache misses.

My favorite example is adding and ordered list of items into a a simple tree, all you've really done is created a linked list. Big-O doesn't know what your data looks like but you generally should.

beagle3 3283 days ago

Simple binary tree is O(n^2) just like a linked list.

Unless you know what you know your distributions and are generally proficient in probability theory (in 99% of the cases, neither can be relied on) the only relevant big-O metric is the worst case one

lorenzhs 3283 days ago

Don't count out sample-sort just yet, it lends itself to parallelisation very well and is blazingly fast. See https://arxiv.org/abs/1705.02257 (to be presented at ESA in September) for an in-place parallel implementation that overcomes the most important downsides of previous sample-sort implementations (linear additional space).

beagle3 3283 days ago

There's actually 3:

Quick sort (unstable, n^2 worst case, in place, heapsort (unstable, n log n worst case, in place) and merge sort (stable, n log n worst case, not in place)

There are variants of each that trade one thing for another (in placeness for stability, constants for worst case), but these are the three efficient comparison sort archetypes.

Of these, quicksort and heap sort can do top-k which is often useful; and heapsort alone can do streaming top-k.

xoroshiro 3283 days ago

Interesting history!

>Sorting is a mostly "solved" problem in theory, but as new hardware emerges different aspects of implementations become more or less important (cache, memory, branch prediction)

This makes me wonder what other hardware tricks might be used for other popular algorithms such as ones used in graphs. I'm sure shortest path is also one of those algorithms that have been "solved" in theory but have a huge amount of research, but personally, what would be more interesting to hear about is something that isn't quite as easy. Something like linear programming with integer constraints or even something like vehicle routing or scheduling. To anyone studying those areas, is there anything you find particularly interesting?

lorenzhs 3283 days ago

There's a vast amount of results for shortest-path search on graphs that look like road networks. This was of course motivated by the application to route planning. If you're looking for a starting point, the literature list at http://i11www.iti.kit.edu/teaching/sommer2017/routenplanung/... is quite comprehensive and grouped by technique. It's a lecture at the group that developed the routing algorithm used by Bing Maps. I work one storey below them, at the group where the algorithm used by Google Maps was developed :)

agumonkey 3284 days ago

Thanks for the alexandrescu paper

yoran 3283 days ago

Thanks for this answer!

DiThi 3284 days ago

One of the problems is that hardware changes. Long time ago memory was very limited and there was virtually no cost with branching. Now we have very complex pipelined architecture with branch prediction, many levels of cache, microcode, etc. And memory is plenty.

contravariant 3284 days ago

What makes things tricky is that there are a couple of common cases that can be sorted in O(n), and that more complicated algorithms might have better asmyptotic behaviour, while being worse for small or even moderately large lists.

To make matters worse there are also more specific sorting algorithms like radix sort, which can be even faster in cases where they can be used.

wiz21c 3283 days ago

There are as many sorting algorithm as their are data distribution. So some algorithm are better suited to some problems. Therefore, each requires specific research.

Moreover, if you study algorithm and if you try to understand/formalize their behaviour, especially theyr memory/speed tradeoffs, then you'll see that they're actually quite complex. See : https://math.stackexchange.com/questions/1313540/a-lower-bou...

Finally, implementing a sort alogirth requires a hell of carefulness. They are super tricky. See for example : http://cs.fit.edu/~pkc/classes/writing/samples/bentley93engi...

lz400 3283 days ago

I don't know about improvements but new methods are still being found, like sleep sort

https://www.reddit.com/r/ProgrammerHumor/comments/5vpdw5/sle...

paulddraper 3284 days ago

This is an improvment in practical cases, not theoretical ones.

jorgemf 3284 days ago

As far as I understood, the base algorithm is the same which has an average case of n log(n). The new algorithms only try to improve the pivot selection to avoid the worst cases and try to be better than the average case for most practical cases. But at the end there are not new algorithms improving the limit of n log(n).