|
|
|
|
|
by markisus
287 days ago
|
|
Interesting article. It’s actually very strange that the dataset needs to be “big” for the O(n log n) algorithm to beat the O(n). Usually you’d expect the big O analysis to be “wrong” for small datasets. I expect that in this case, like in all cases, as the datasets become gallactically large, the O(n) algorithm will start winning again. |
|
Also, memory access is constant time only to some upper limit allowed by the hardware, which requires significant changes to the implementation when the data does not fit the system memory. So, the hash algorithm will not stay O(n) once you go past the available memory.
The sorting algorithms do not suffer from these complexities quite as much, and similar approaches can be used with data sets that do not fit a single system's memory. The sorting-based algorithms will likely win in the galactically large cases.
Edit: Also, once the hash table would need to grow beyond what the hash function can describe (e.g. beyond 64 bit integers), you need to grow the function's data type. This is essentially a hidden log(n) factor, as the required length of the data type is log(n) of the maximum data size.