Hacker News new | ask | show | jobs
by adrian_b 1383 days ago
The AVX-512 instruction set has never been a flop. It is much a much better instruction set than AVX.

Most AVX-512 instructions have 3 variants, with 512-bit registers, with 256-bit registers or with 128-bit registers.

When using the 256-bit or the 128-bit AVX-512 instructions, there has never been any disadvantage versus using AVX.

The only problems have been when using the 512-bit AVX-512 instructions, especially on the CPUs derived from Skylake Server, due to the way how Intel has implemented the clock frequency control.

Using the 512-bit AVX-512 instructions requires more power than when using the 256-bit AVX-512 instructions, the same as when using e.g. 4 cores instead of 2 cores. In both cases, when doubling the operation width or when doubling the number of active cores, the clock frequency is reduced.

When a program has a large proportion of 512-bit instructions, then the throughput is higher despite the lower clock frequency.

On the other hand, when a program has only a few 512-bit instructions, the execution will be slowed down for almost a second after 512-bit instructions are no longer used, until the CPU decides to power down the upper half of the 512-bit units.

All this problem is caused because the Intel CPU tries to be too smart and decides automatically when to power down the unused units.

In the similar case when using more cores, there is no problem because when the core is no longer used, the program has a halt or a MWAIT instruction which powers down immediately the core, restoring the higher clock frequency.

If Intel had provided an instruction like "end of 512-bit instructions" to power down the upper halves of the execution units immediately, there would have been no problems with the slow down caused by sporadically using a few 512-bit instructions, exactly like there is no problem when launching some extra execution threads, because the clock frequency is restored when the extra threads finish or are suspended.

Because Zen 4 has the same execution units as Zen 3, using AVX-512 on Zen 4 will not cause any kind of slow down that would not have also happened when using AVX on Zen 3.

1 comments

Thank you Adrian (B) for explaining this thoroughly! I'll use your comment as future reference for me!