|
|
|
|
|
by buildbot
744 days ago
|
|
AVX512 in a single cycle vs 2 cycles is big if the clock speed can be maintained at all near 5GHz. Also doubling of L1 cache bandwidth is interesting! Possibly, needed to actually feed an AVX512 rich instruction stream I guess. |
|
I expect that this will remain true for Zen 5 and the next Intel CPUs.
The only important differences in throughput between Intel and AMD were for the 512-bit load and store instructions from the L1 cache and for the 512-bit fused multiply-add instructions, where Intel had double throughput in its more expensive models of server CPUs.
I interpret AMD's announcement that now Zen 5 has a double transfer throughput between the 512-bit registers and the L1 cache and also a double 512-bit FP multiplier, so now it matches the Intel AVX-512 throughput per clock cycle in all important instructions.