Hacker News new | ask | show | jobs
by anonymoushn 699 days ago
One commonly sees the implication that radix sort cannot be used for data types other than integers, or for composite data types, or for large data types. For example, TFA says you could use radix sort if your input is 32-bit integers. But you can use it on anything. You can use radix sort to sort strings in O(n) time.
2 comments

It should also be noted that radix sort is ridiculously fast because it just scans linearly through the list each time.

It's actually hard to come up with something that cannot be sorted lexicographically. The best example I was able to find was big fractions. Though even then you could write them as continued fractions and sort those lexicographically (would be a bit trickier than strings).

Sorting fractions by numerical value is a good example. Previously I've heard that there are some standard collation schemes for some human languages that resist radix sort, but when I asked about which ones in specific I didn't hear back :(
The Unicode Collation algorithm doesn't look fun to implement in radix sort, but not entirely impossible either. They do note that some characters are contextual, an example they give is that CH can be treated as a single character that sorts after C (so also after CZ). Technically that is still lexicographical, but not byte-for-byte.
Problem with radix sorting strings is that it is O(k*N) where k is length of key, in this case it's the second longest string's length. Additional problems arise if you are dealing with null terminated strings and do not have the length stored.

Radix sort is awesome if k is small, N is huge and/or you are using a GPU. On a CPU, comparison based sorting is faster in most cases.

No, it's O(N+M) where N is the number of strings and M is the sum of the lengths of the strings. Maybe your radix sort has some problems?

I evaluated various sorts for strings as part of my winning submission to https://easyperf.net/blog/2022/05/28/Performance-analysis-an... and found https://github.com/bingmann/parallel-string-sorting to be helpful. For a single core, the fastest implementation among those in parallel-string-sorting was a radix sort, so my submission included a radix sort based on that one.

The only other contender was multi-key quicksort, which is notably not a comparison sort (i.e. a general-purpose string comparison function is not used as a subroutine of multi-key quicksort). In either case, you end up operating on something like an array of structs containing a pointer to the string, an integer offset into the string, and a few cached bytes from the string, and in either case I don't really know what problems you expect to have if you're dealing with null-terminated strings.

A very similar similar radix sort is included in https://github.com/alichraghi/zort which includes some benchmarks, but I haven't done the work to have it work on strings or arbitrary structs.

> No, it's O(N+M) where N is the number of strings and M is the sum of the lengths of the strings.

That would mean it's possible to sort N random 64-bit integers in O(N+M) which is just O(N) with a constant factor of 9 (if taking the length in bytes) or 65 (if taking the length in bits), so sort billions of random integers in linear time, is that truly right?

EDIT: I think it does make sense, M is length*N, and in scenarios where this matters this length will be in the order of log(N) anyway so it's still NlogN-sh.

> That would mean it's possible to sort N random 64-bit integers in O(N+M) which is just O(N) with a constant factor of 9 (if taking the length in bytes) or 65 (if taking the length in bits), so sort billions of random integers in linear time, is that truly right?

Yes. You can sort just about anything in linear time.

> EDIT: I think it does make sense, M is length*N, and in scenarios where this matters this length will be in the order of log(N) anyway so it's still NlogN-sh.

I mainly wrote N+M rather than M to be pedantically correct for degenerate inputs that consist of mostly empty strings. Regardless, if you consider the size of the whole input, it's linear in that.

Yes, radix sort can sort integers in linear O(N) time.
If you have a million 1-character strings and one string of length 1 million, how many steps would your LSD radix sort take? And (if it's indeed linear in the total input size like you say) how do you make it jump over the empty slots without losing real-world efficiency in other cases?
It's MSB radix sort. I think LSB radix sort is not generally as useful because even for fixed-size inputs it often makes more passes over most of the input than MSB radix sort.

Your comment makes me think it would be swell to add a fast path for when the input range compares equal for the current byte or bytes though. In general radix sorts have some computation to spare (on commonly used CPUs, more computation may be performed per element during the histogramming pass without spending any additional time). Some cutting-edge radix sorts spend this spare computation to look for sorted subsequences, etc.

Oh I see. I've found LSD better in some cases, but maybe it's just my implementation. For MSD, do you get a performance hit from putting your stack on the heap (to avoid overflow on recursion)? I don't just mean the minor register vs. memory differences in the codegen, but also the cache misses & memory pressure due to potentially long input elements (and thus correspondingly large stack to manage).

> Some cutting-edge radix sorts

Where do you find these? :-)

> For MSD, do you get a performance hit from putting your stack on the heap (to avoid overflow on recursion)?

I'm not sure. My original C++ implementation which is for non-adversarial inputs puts the array containing histograms and indices on the heap but uses the stack for a handful of locals, so it would explode if you passed the wrong thing. The sort it's based on in the parallel-string-sorting repo works the same way. Oops!

> cache misses & memory pressure due to potentially long input elements (and thus correspondingly large stack)

I think this should be okay? The best of these sorts try to visit the actual strings infrequently. You'd achieve worst-case cache performance by passing a pretty large number of strings with a long, equal prefix. Ideally these strings would share the bottom ~16 bits of their address, so that they could evict each other from cache when you access them. See https://danluu.com/3c-conflict/

> Where do you find these? :-)

There's a list of cool sorts here https://github.com/scandum/quadsort?tab=readme-ov-file#varia... and they are often submitted to HN.