|
|
|
|
|
by Const-me
589 days ago
|
|
One FMA counts as two floating-point operations: one multiplication and one addition. According to uops.info, Zen 4 cores can do two 8-wide FMA instructions per cycle, or one 16-wide FMA per cycle. See VFMADD132PS (YMM, YMM, YMM) and VFMADD132PS (ZMM, ZMM, ZMM) respectively, the throughput column is labelled TP. That’s where 32 FLOP/cycle number comes from. > doesn't have "native" support for AVX-512 but "mimics" it through 2x 256-bit FMA units That’s correct, AVX512 doesn’t deliver more FLOPs on that CPU. The throughput of 32-byte FMA and 64-byte FMA is the same, 32 FLOP/cycle for FP32 numbers. |
|
Right. This is where the discrepancy comes from. I counted FMA as a single FLOP.