Hacker News new | ask | show | jobs
by ribit 1921 days ago
Apple M1, can do four fused multiply-adds per cycle with latency of 4 cycles. Interestingly enough it seems that the latency on the vector FMA is even lower. So it’s 16 float FMA per cycle.

Source: https://dougallj.github.io/applecpu/firestorm-simd.html