|
|
|
|
|
by dataflow
699 days ago
|
|
Oh I see. I've found LSD better in some cases, but maybe it's just my implementation. For MSD, do you get a performance hit from putting your stack on the heap (to avoid overflow on recursion)? I don't just mean the minor register vs. memory differences in the codegen, but also the cache misses & memory pressure due to potentially long input elements (and thus correspondingly large stack to manage). > Some cutting-edge radix sorts Where do you find these? :-) |
|
I'm not sure. My original C++ implementation which is for non-adversarial inputs puts the array containing histograms and indices on the heap but uses the stack for a handful of locals, so it would explode if you passed the wrong thing. The sort it's based on in the parallel-string-sorting repo works the same way. Oops!
> cache misses & memory pressure due to potentially long input elements (and thus correspondingly large stack)
I think this should be okay? The best of these sorts try to visit the actual strings infrequently. You'd achieve worst-case cache performance by passing a pretty large number of strings with a long, equal prefix. Ideally these strings would share the bottom ~16 bits of their address, so that they could evict each other from cache when you access them. See https://danluu.com/3c-conflict/
> Where do you find these? :-)
There's a list of cool sorts here https://github.com/scandum/quadsort?tab=readme-ov-file#varia... and they are often submitted to HN.