| Basically all comparison-based sort algorithms we use today stem from two basic algorithms: mergesort (stable sort, from 1945) and quicksort (unstable sort, from 1959). Mergesort was improved by Tim Peters in 2002 and that became timsort. He invented a way to take advantage of pre-sorted intervals in arrays to speed up sorting. It's basically an additional layer over mergesort with a few other low-level tricks to minimize the amount memcpying. Quicksort was improved by David Musser in 1997 when he developed introsort. He set a strict worst-case bound of O(n log n) on the algorithm, as well as improved the pivot selection strategy. And people are inventing new ways of pivot selection all the time. E.g. Andrei Alexandrescu has published a new method in 2017[1]. In 2016 Edelkamp and Weiß found a way to eliminate branch mispredictions during the partitioning phase in quicksort/introsort. This is a vast improvement. The same year Orson Peters adopted this technique and developed pattern-defeating quicksort. He also figured out multiple ways to take advantage of partially sorted arrays. Sorting is a mostly "solved" problem in theory, but as new hardware emerges different aspects of implementations become more or less important (cache, memory, branch prediction) and then we figure out new tricks to take advantage of modern hardware. And finally, multicore became a thing fairly recently so there's a push to explore sorting in yet another direction... [1] http://erdani.com/research/sea2017.pdf |
Often a linear search of a "dumb" array can be the fastest way to accomplish something because it is very amenable to pre-fetching (it is obvious to the pre-fetcher what address will be needed next). Even a large array may fit entirely in L2 or L3. For small data structures arrays are almost always a win; in some cases even hashing is slower than a brute-force search of an array!
A good middle ground can be a binary tree with a bit less than an L1's worth of entries in an array stored at each node. The binary tree lets you skip around the array quickly while the CPU can zip through the elements at each node.
It is more important than ever to test your assumptions. Once you've done the Big-O analysis to eliminate exponential algorithms and other basic optimizations you need to analyze the actual on-chip performance, including cache behavior and branch prediction.