Y
Hacker News
new
|
ask
|
show
|
jobs
by
ribit
1921 days ago
Apple M1, can do four fused multiply-adds per cycle with latency of 4 cycles. Interestingly enough it seems that the latency on the vector FMA is even lower. So it’s 16 float FMA per cycle.
Source:
https://dougallj.github.io/applecpu/firestorm-simd.html