Hacker News new | ask | show | jobs
by shawnz 2349 days ago
Isn't the OP's "hash-unique" technique basically a radix sort?
2 comments

A little bit, yeah: it's like a bit like single-pass radix sort, i.e., where the "digit" in the radix sort is selected to be large enough that one digit encompasses the whole key. Another difference is bucket management strategy is different: using a fixed set of buckets with chaining rather than counting ahead of time the buckets you'll need and laying them out linearly.

Note that locality of reference is so important that even though "single pass" radix sort does the least work, the ideal radix turns out to be 6-10 bits usually, even if that means e.g., 5-10x more instructions executed for 64-bit keys: because sparsely writing into buckets spread across memory is just really slow.

Interesting, thanks for the explanation!
Not really, no. The only parallel is that you're putting things into buckets, but other than that they're very different algorithms.
Wouldn't the complexity, number of movements, etc also be the same? What's the difference?

Why wouldn't a radix sort approach suffer from the same disadvantages as the hash unique approach (e.g. locality of reference)?

It's worth giving the Wiki page a quick read, it's very accessible: https://en.wikipedia.org/wiki/Radix_sort

TL;DR: I guess the hash unique approach is technically a radix sort in base-64 if you hash the inputs before applying the radix sort, but a realistic radix sort will use far fewer buckets giving it much better cache locality and won't unnecessarily hash the buckets.

Also it's possible to write a radix sort that doesn't use any extra indirection, but that's a much more complex implementation and I doubt it really has any performance benefits.