|
|
|
|
|
by mratsim
2432 days ago
|
|
FMA is not always faster, it has a high latency: 5-6 cycles depending on the CPU while Add and MUL have very low-latency. This means that to fully utilizes FMA you need to unroll a loop more. Sometimes yyou just can't, and the other time you use more instructions, use more cache. In short it's not always better. Also as other said, FMA has better accuracy than separate Add + Mul |
|