|
|
|
|
|
by thecompilr
3108 days ago
|
|
Can't turbo as much is still a net loss of performance. Even on Platinum turbo frequency is almost 10% lower with AVX512 on single core, and 30% lower with all cores. So you really want the majority of your software to use AVX512 to gain net benefit. It takes the system 2ms to recover after an AVX512 instruction.
But you are correct that the Silvers are way worse. I suspect Intel intentionally killed AVX512 performance on the Silvers. I tested power consumption, and there is no reason to reduce the frequency, except for the sake of it.
The sad thing is there is no CPUID flag to distinguish good AVX512 from useless AVX512. Would really be better if they disabled it completely on Silver. The way it is now will just hurt adoption. |
|
> The sad thing is there is no CPUID flag to distinguish good AVX512 from useless AVX512.
You can read the the avx512_2ndFMA bit from the PIROM, according to this Intel datasheet: https://www.intel.com/content/www/us/en/processors/xeon/scal...
Linux doesn't implement reading PIROM over SMBus, but it sure would be nice to expose this flag in /proc/cpuinfo.
In WireGuard we're at the moment just disabling the zmm AVX512F implementation on Skylake-X, falling back to the still-fast-but-not-as-fast AVX512VL implementation that only touches ymm and doesn't downclock as much (following OpenSSL's reasoning on +/- Andy Polyakov's same implementation):
https://git.zx2c4.com/WireGuard/tree/src/crypto/chacha20poly...
I may look into trying to read the PIROM so that I can make a more informed decision. I've tested those Platinum boxes, and indeed it's a lot faster there, even with the [lesser] downclocking, whereas a Gold box didn't perform as well, making the ymm-only implementation necessary.