| HN Mirror

Yes, this is discussed in section 4.2 of our paper: https://arxiv.org/pdf/2205.05982.pdf

In short, it turns out not to help for single core with vectors, but a few initial passes of ips4o (with 64..256-way partitioning) is faster for parallel sorts.