| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jart 1481 days ago
	I wonder how fast it is compared to djbsort https://github.com/jart/cosmopolitan/blob/master/libc/nexgen... and longsort https://github.com/jart/cosmopolitan/blob/e011973593407f576d... djbsort is outrageously fast for 32-bit ints with avx2 (which unlike avx512 it's the avx we can reliably use in open source). But there's never been a clear instruction set to use on Intel / AMD for sorting 64-bit ints that's reliably fast. So if this thing can actually sort 64-bit integers 10x faster on avx2 I'd be so thrilled.

2 comments

janwas 1481 days ago

Yes, we can sort 64-bit ints. The speedup on AVX2 is roughly 2/3 of the 10x we see on AVX-512. Longsort appears to be an autovectorized sorting network. That's only going to be competitive or even viable for relatively small arrays (thousands). See comments above on djbsort.

Why not use whichever AVX the CPU has? Not a problem when using runtime dispatch :)

heavenlyblue 1481 days ago

What about performance-per-watt?

sitkack 1480 days ago

Main memory accesses dominate energy consumption, so the lower your total memory bandwidth the less energy an algorithm will take.

https://www.researchgate.net/figure/Data-movement-is-overtak...

The chart above shows a 1000x (3 orders of magnitude base 10) increase in energy consumption relative to a register move (it really should be called copy).

celrod 1481 days ago

The bigger the vectors, the better the performance per watt.

wtallis 1481 days ago

> it's the avx we can reliably use in open source

I'm not sure what you mean by that. You can't assume the presence of AVX or AVX2 without explicitly checking for it, because Intel was still disabling those features on new low-end Pentium and Celeron parts at least a recently as Comet Lake (2020). Sure, AVX2 support is much more widespread than AVX512 support, but that has nothing to do with open-source and it's a bit strange to describe that in terms of reliability.

jart 1481 days ago

Some of us like to think of ourselves writing open source as serving the public interest. It's hard to do that if you're focusing on an ISA the public doesn't have. I haven't seen any consumer hardware that has AVX512.

eklitzke 1481 days ago

Lots of consumer hardware has AVX512 (I have an 11th gen Intel laptop CPU that has it).

Regardless, Clang and GCC both support function multi-versioning where you supply multiple versions of a function and specify which CPU features each implementation needs, and the best version of the function will be selected at runtime based on the results of cpuid. For example, you can use this to write a function that uses no vector instructions, SEE, AVX2, or AVX512 and all versions will be compiled into the executable and the best version you can actually use will be selected at runtime. This is how glibc selects the optimal version of functions like memset/memcpy/memcmp, as there are vector instructions that significantly speed these functions up.

There's an LWN article about the feature if you're curious how it works: https://lwn.net/Articles/691932/

janwas 1481 days ago

I agree AVX-512 is not exactly widespread on client CPUs but as akelly mentions, it does exist (e.g. Icelake).

What we do is dispatch to the best available instruction set at runtime - that costs only an indirect branch, plus somewhat larger binary and longer compile time.

wtallis 1481 days ago

Even if AVX512 was entirely constrained to server hardware (it's not), how would it be contrary to the public interest for open-source software to take advantage of those instructions?

akelly 1481 days ago

Intel 10th gen mobile and 11th gen mobile and desktop, excluding Pentium and Celeron, have AVX-512. And all 12th gen have it on the P cores but not the E cores. If the E cores are enabled then AVX-512 is unavailable.

monocasa 1481 days ago

On 12th gen they disabled it on the P cores too even with E cores disabled with a microcode update. A lot of newer systems don't have access to the older microcode, and microcode doesn't typically let you downgrade.

wtallis 1481 days ago

There are workarounds for downgrading microcode, because the CPU itself doesn't actually have non-volatile storage for microcode updates and relies on the motherboard firmware to upload updates on each boot (and motherboard firmware can often be downgraded, possibly after changing a setting to allow that).

Which is probably why Intel has changed to disabling AVX512 using fuses in more recently manufactured Alder Lake CPUs.

monocasa 1481 days ago

My point with "a lot of newer systems" was that there are motherboards now that completely lack a version of their firmware with the microcode that allows avx-512. There's nothing to downgrade to without an exploit to allow making your own firmware images with mixed and matched components.

robertlagrant 1481 days ago

> Some of us like to think of ourselves as

I don't see how this is relevant to anything.

watmough 1480 days ago

Tiger Lake https://ark.intel.com/content/www/us/en/ark/products/213803/...

Sniff, already superceded by Alder Lake.