|
|
|
|
|
by anonymoushn
1386 days ago
|
|
This is basically all incorrect. Maybe you are writing about LSB radix sort? I am writing about MSB radix sort. You can get some off-the-shelf MSB radix sort for sorting strings of arbitrary lengths here[0]. While this repository is mainly about parallel string sorts, I've found a version of this MSB radix sort that does not perform a bunch of unnecessary copies and caches more bytes at a time[1] to be competitive with the parallel sorts in the repo. In general MSB radix sort will have to look at the same parts of the input elements as multi-key quick sort, but one hopes that it gets to make fewer passes over the array. A comparator-based sort would look at about the same parts of the input elements as well, but it would look at them many times more than necessary. > Finally, it requires a sort-order-preserving transformation. I mentioned case-insensitive collation of Unicode strings as a well-known difficult problem. I do not believe there is mapping to an order-preserving representation which can be done bitwise. At the very least, it will be difficult to support all of the collation styles that currently exist (eg, French collation is different than Dutch). The requirements are a bit underspecified, but I think these can be solved by unicode normalization + tolower + codepoint-wise comparison, which is probably what you'd do in your comparator for a comparison-based sort as well. [0]: https://github.com/bingmann/parallel-string-sorting/blob/mas... [1]: https://github.com/dendibakh/perf-challenge6/blob/Solution_R... |
|