Hacker News new | ask | show | jobs
by filiphorvat 536 days ago
Unrelated, do FPUs on modern CPUs use FMAs to both multiply and add or do they use mul/add-only units?
2 comments

I don't think there is a generally optimal design. There are cons and pros to using the same homogeneous FMAs units for adds, multiplies and fmas, even at the cost of making adds slower (simpler design, and having all instructions of the same latency greatly simplifies scheduling). IIRC intel cycled through 4 cycles fma, add and mul, then to 4 cycles add and mul and 5 cycles fmas, then with a dedicated 3 cycles add.

The optimal design depends a lot on the rest of the microarchitecture, the loads the core is being optimized for, the target frequency, the memory latency, etc.

Probably to do multiplies, as the extra add is basically free. Adds are cheaper.
Adds are cheaper only for fixed-point computations. Floating point addition needs to denormalize one of its' arguments, perform an (integer) addition and then normalize the result.

Usually FP adds take a cycle or two longer than FP multiplication.

Depends on what you mean by ‘cheaper’. Multiplies are still more gates. The adds are slower due to longer dependency chains, not because they cost more gates.