Hacker News new | ask | show | jobs
by zx2c4 3112 days ago
Of interest regarding this might be: https://twitter.com/InstLatX64/status/934093081514831872

> The sad thing is there is no CPUID flag to distinguish good AVX512 from useless AVX512.

You can read the the avx512_2ndFMA bit from the PIROM, according to this Intel datasheet: https://www.intel.com/content/www/us/en/processors/xeon/scal...

Linux doesn't implement reading PIROM over SMBus, but it sure would be nice to expose this flag in /proc/cpuinfo.

In WireGuard we're at the moment just disabling the zmm AVX512F implementation on Skylake-X, falling back to the still-fast-but-not-as-fast AVX512VL implementation that only touches ymm and doesn't downclock as much (following OpenSSL's reasoning on +/- Andy Polyakov's same implementation):

https://git.zx2c4.com/WireGuard/tree/src/crypto/chacha20poly...

I may look into trying to read the PIROM so that I can make a more informed decision. I've tested those Platinum boxes, and indeed it's a lot faster there, even with the [lesser] downclocking, whereas a Gold box didn't perform as well, making the ymm-only implementation necessary.

1 comments

If that is an issue for you, you could try using the implementation I wrote for boringssl. It avoids SIMD multiplications altogether and only uses simple AVX2 instructions, so there is no slowdown (AFAICT) although it is not as fast as AVX512VL from OpenSSL in benchmarks.