| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by carbocation 1481 days ago
	Looks like djb is giving it a spin: https://twitter.com/hashbreaker/status/1533201734369103872

2 comments

janwas 1478 days ago

Response here because I'm not on Twitter: https://github.com/google/highway/issues/736

In short, what is being compared is O(1) djbsort sorting network, vs our full quicksort with pivot sampling, partitioning, then sorting network.

This is because our sorting network size is 16 * elements_per_vector i.e. 128 in this configuration.

link

sydthrowaway 1481 days ago

And he shat all over Google yet again

link

viraptor 1481 days ago

He's just investigating how to reproduce the claimed result. Not sure where you got your take from.

link

jiggawatts 1481 days ago

Something to note is that he was testing this on an 11-year old processor: https://ark.intel.com/content/www/us/en/ark/products/52269/i...

The Google sorting algorithm seems to be optimised for the "latest and greatest" AVX-512 capable CPUs.

link

wtallis 1480 days ago

The Xeon E3-1220 v5 is not the same as the original E3-1220. You want this link: https://ark.intel.com/content/www/us/en/ark/products/88172/i...

The E3 v5 series were part of the generation codenamed Skylake, introduced in 2015. But the Skylake microarchitecture was reused in each subsequent new Intel desktop processor generation through 2019's Comet Lake (due to Intel's 10nm failure). They didn't introduce a new microarchitecture in that product segment until Rocket Lake and Alder Lake, both in 2021. So despite being almost 7 years old, the E3-1220v5 is still representative of most of the installed base for Intel desktops and entry-level workstations, and a large chunk of their mobile installed base.

(The original E3-1220 predates AVX2 by two years, so this code wouldn't even run on it.)

link

jiggawatts 1480 days ago

Well spotted!

link

janwas 1480 days ago

Oh, thanks for pointing that out. Golly, Sandy Bridge is a bit old, yes - but still the result is surprising.

djb reports 8000 cycles for int32 x 256 - this is much slower than we benchmark in bench_sort.cc, even for AVX2 (which he confirms is being reached). Not sure what's going on.

link

aaaaaaaaaaab 1481 days ago

But he’s right.

link