| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by orlp 1481 days ago
	SIMD based sorting isn't new, e.g. djbsort: https://sorting.cr.yp.to/. I find it strange the paper doesn't cite it or compare to it at all. EDIT: to be fair it does seem that Bramas' first published work on SIMD sorting (that I could find) predates djbsort, https://arxiv.org/abs/1704.08579. A comparison would still be nice though.

5 comments

nostrademons 1481 days ago

It's not new within Google either. I remember poking through Jeff Dean's original MapReduce code (written c. 2003) and finding a SIMD-based QuickSort that was the in-memory portion of the external sort algorithm.

It looks like this algorithm is interesting in the specifics of how it works, eg. the use of a compress-store instruction for partitioning. The algorithm I mentioned above mostly focused on speeding up comparisons through the use of SIMD instructions, rather than changing the partitioning part of the QuickSort algorithm itself. Not sure how that compares to djbsort, I didn't see technical details of the algorithm on the page.

link

janwas 1481 days ago

Interesting, I did not know that, thanks for the pointer. There is also an effort underway to update LLVM std::sort to BlockQuicksort, which can use some SIMD instructions similarly to what you describe.

link

jeffbee 1481 days ago

The blog post does seem to be quite explicit about the fact that their contribution is in the field of portability, not novelty.

link

convolvatron 1481 days ago

nothing happened before google

https://dl.acm.org/doi/10.1145/113379.113380

link

janwas 1481 days ago

hah, our paper actually cites "Radix sort for vector multiprocessors" by the same author, also from 1991 :)

link

janwas 1481 days ago

What exactly should be cited? I had a quick look and djbsort seems to be a constant-time algorithm, presumably a sorting network. They report about 2 GB/s for 768 int32 on 3 GHz Skylake; IIRC our sorting network reaches 3-4 GB/s, but for a smaller input size (256 int32 or int64). Hard to compare directly, but djbsort is likely slower, and the sorting network is anyway a small fraction of the total time in a Quicksort.

link

TontonNestor 1481 days ago

djbsort uses a sorting network, not a quicksort. It is also designed to be in constant time, which is not the case with the linked blog post. So the restrictions are different.

This is not my field, so I can't judge the relevance of this, but the author cites the state of the art. It just seems that djbsort is not the state of the art and therefore does not need to be cited.

I still don't understand the hype around Bernstein, he is quite relevant in cryptography/cryptography implementations but not everything he does is gold.

link

janwas 1481 days ago

:) FWIW we also use a constant-time sorting network as the base case of the Quicksort recursion, that's a widely used optimization.

link

dundarious 1480 days ago

For 32bits, djb has done light testing, indicating djbsort would be a better base case.

> So far I haven't been able to verify these vqsort speed claims. On the contrary, it seems that, for 32-bit data types on AVX2, vqsort would be faster if its base-case code were replaced by a call to the 2018 djbsort code. Similarly, vqsort should reuse vxsort-cpp for AVX-512.

https://twitter.com/hashbreaker/status/1533314687726538753

link

mixedCase 1481 days ago

The article acknowledges SIMD based sorting and states what makes this novel.

link

explaingarlic 1481 days ago

The whole "stick SIMD into it and compare it with the stdlib" thing is a really common learning trope. Feels more like a portfolio filler, albeit probably not a useless one.

link

janwas 1481 days ago

We also compare with state of the art platform-specific code. The interesting thing here is that they turn out to be slower than our approach using portable intrinsics (github.com/google/highway) :)

link