| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by shawnz 2397 days ago
	Isn't the OP's "hash-unique" technique basically a radix sort?

2 comments

BeeOnRope 2396 days ago

A little bit, yeah: it's like a bit like single-pass radix sort, i.e., where the "digit" in the radix sort is selected to be large enough that one digit encompasses the whole key. Another difference is bucket management strategy is different: using a fixed set of buckets with chaining rather than counting ahead of time the buckets you'll need and laying them out linearly.

Note that locality of reference is so important that even though "single pass" radix sort does the least work, the ideal radix turns out to be 6-10 bits usually, even if that means e.g., 5-10x more instructions executed for 64-bit keys: because sparsely writing into buckets spread across memory is just really slow.

link

shawnz 2396 days ago

Interesting, thanks for the explanation!

link

Sean1708 2396 days ago

Not really, no. The only parallel is that you're putting things into buckets, but other than that they're very different algorithms.

link

shawnz 2396 days ago

Wouldn't the complexity, number of movements, etc also be the same? What's the difference?

Why wouldn't a radix sort approach suffer from the same disadvantages as the hash unique approach (e.g. locality of reference)?

link

Sean1708 2390 days ago

It's worth giving the Wiki page a quick read, it's very accessible: https://en.wikipedia.org/wiki/Radix_sort

TL;DR: I guess the hash unique approach is technically a radix sort in base-64 if you hash the inputs before applying the radix sort, but a realistic radix sort will use far fewer buckets giving it much better cache locality and won't unnecessarily hash the buckets.

Also it's possible to write a radix sort that doesn't use any extra indirection, but that's a much more complex implementation and I doubt it really has any performance benefits.

link