Hacker News new | ask | show | jobs
by malkia 1383 days ago
Didn't AVX-512 turned out to be a flop? e.g. https://en.wikipedia.org/wiki/AVX-512#Performance -

"On some processors AVX-512 instructions cause a frequency throttling even greater than its predecessors, causing a penalty for mixed workloads. The additional downclocking is triggered by the 512-bit width of vectors and depend on the nature of instructions being executed, and using the 128 or 256-bit part of AVX-512 (AVX-512VL) does not trigger it. As a result, gcc and clang default to prefer using the 256-bit vectors. ()"

() - https://stackoverflow.com/questions/56852812/simd-instructio...

3 comments

The AVX-512 instruction set has never been a flop. It is much a much better instruction set than AVX.

Most AVX-512 instructions have 3 variants, with 512-bit registers, with 256-bit registers or with 128-bit registers.

When using the 256-bit or the 128-bit AVX-512 instructions, there has never been any disadvantage versus using AVX.

The only problems have been when using the 512-bit AVX-512 instructions, especially on the CPUs derived from Skylake Server, due to the way how Intel has implemented the clock frequency control.

Using the 512-bit AVX-512 instructions requires more power than when using the 256-bit AVX-512 instructions, the same as when using e.g. 4 cores instead of 2 cores. In both cases, when doubling the operation width or when doubling the number of active cores, the clock frequency is reduced.

When a program has a large proportion of 512-bit instructions, then the throughput is higher despite the lower clock frequency.

On the other hand, when a program has only a few 512-bit instructions, the execution will be slowed down for almost a second after 512-bit instructions are no longer used, until the CPU decides to power down the upper half of the 512-bit units.

All this problem is caused because the Intel CPU tries to be too smart and decides automatically when to power down the unused units.

In the similar case when using more cores, there is no problem because when the core is no longer used, the program has a halt or a MWAIT instruction which powers down immediately the core, restoring the higher clock frequency.

If Intel had provided an instruction like "end of 512-bit instructions" to power down the upper halves of the execution units immediately, there would have been no problems with the slow down caused by sporadically using a few 512-bit instructions, exactly like there is no problem when launching some extra execution threads, because the clock frequency is restored when the extra threads finish or are suspended.

Because Zen 4 has the same execution units as Zen 3, using AVX-512 on Zen 4 will not cause any kind of slow down that would not have also happened when using AVX on Zen 3.

Thank you Adrian (B) for explaining this thoroughly! I'll use your comment as future reference for me!
AVX-512 was a bit of a flop initially, because of how Intel implemented it. AMD's solution doesn't provide quite as much peak throughput for highly-optimized code, but is a better way of providing the flexibility benefits of AVX-512 to the masses without the severe downclocking. There may still be plenty of situations where it would make sense to use 256-bit vectors with AVX-512 instructions, but on Zen 4 there won't be a strong reason to avoid 512-bit vectors where they are useful.
It was a flop because it was intended for a process node that Intel was delayed on for years. It had massive problems to the point of not really making sense when backported to older nodes.
I don't think it's accurate to say AVX-512 was backported. The original Skylake consumer CPUs released as the second generation of products on Intel's 14nm already had space reserved in the CPU core floorplan for the AVX-512 register file. That space didn't get used until the Skylake server CPUs shipped, still on 14nm several years later. AVX-512 support didn't arrive in the consumer desktop product line until Rocket Lake, which was backported to 14nm but was not remotely the beginning of the AVX-512 story.