Hacker News new | ask | show | jobs
by robocat 2172 days ago
The AVX512 instructions can cause strange global performance downgrades.

“One challenge with AVX-512 is that it can actually _slow down_ your code. It's so power hungry that if you're using it on more than one core it almost immediately incurs significant throttling. Now, if everything you're doing is 512 bits at a time, you're still winning. But if you're interleaving scalar and vector arithmetic, the drop in clock speeds could slow down the scalar code quite substantially.“ - 3JPLW and https://blog.cloudflare.com/on-the-dangers-of-intels-frequen...

The processor does not immediately downclock when encountering heavy AVX512 instructions: it will first execute these instructions with reduced performance (say 4x slower) and only when there are many of them will the processor change its frequency. Light 512-bit instructions will move the core to a slightly lower clock.

* Downclocking is per core and for a short time after you have used particular instructions (e.g., ~2ms).

* The downclocking of a core is based on: the current license level of that core, and also the total number of active cores on the same CPU socket (irrespective of the license level of the other cores).

As per https://lemire.me/blog/2018/09/07/avx-512-when-and-how-to-us...

2 comments

> The AVX512 instructions can cause strange global performance downgrades.

Can other SIMD instructions (AVX2, say) do the same?

> Can other SIMD instructions (AVX2, say) do the same?

On Intel CPUs, yes. There's even a BIOS/UEFI setting to specify how much you want the clock frequency to drop when running AVX code called "AVX offset". AMD CPUs doesn't do that though as far as I know.

The thermal hit of using wider vectors decreases with every node shrink though, so expect the issue to become muted over time (which also explains why that doesn't apply to AMD - their only µarch with 256-bit execution units, Zen 2, is on a better node than Intel).

AVX offset is only available on motherboards that support overclocking due to how much higher intel CPUs can be pushed relatively to their advertised base and boost clocks.

Both Zen and Intel lower their clocks under load especially AVX, keep in mind that Zen 2 doesn’t even reach its advertised boost clocks under any load some CPUs come close to within 100mhz or so but overall they all clock down rather fast once TMax or PMax is reached.

AVX was slowing down some code if input was less than 128 bits wide.
wonder if this could be used as a denial of service against a vps host node.