FMA isn't a safe optimization as it can give different results.
C++ compilers have flags to enable it globally. gcc and clang include the optimization in -Ofast.
Rust allows you to choose at a code level (but usually people don't know about it). Perhaps it should also have a global fast-math flag that would automatically optimize it. Pros and cons to that.
FMA is "safe" in that if it breaks your code, it was already broken. It can only make the results slightly more accurate, unlike for instance the rsqrt instruction which is less accurate. (and as such is not a safe optimization)
GCC emits FMA instructions at -O2 without -ffast-math.
Well, it could "break" your code in that it might make your code produce different results than a separate equivalent implementation that didn't use FMA.
I wasn't trying to imply it should be on by default. Often one does not care about the lower bits of the floats, but do want the speed. For some tasks it's very much the opposite. Being able to specify a global option with local override is a great combo.
FMA is not always faster, it has a high latency: 5-6 cycles depending on the CPU while Add and MUL have very low-latency.
This means that to fully utilizes FMA you need to unroll a loop more. Sometimes yyou just can't, and the other time you use more instructions, use more cache.
In short it's not always better.
Also as other said, FMA has better accuracy than separate Add + Mul
If you’re doing fiddly numerical work, this must definitely be optional, as swapping separate multiplication and addition for FMA (or vice versa) can compromise correctness. In some cases you need two different algorithms if FMA is present or absent.
Some algorithms guarantee that some arithmetic operation(s) applied to 2+ floating point inputs will result in a list of floating point outputs which when summed have exactly the correct result. This gets all screwed up if you mess with the order of operations or the rounding of intermediate results.
Fair, I think it would be very helpful for Rust if some expert actually knows of a specific example for which this is the case. I think there is an RFC about enabling floating-point contraction by default, that would "silently" be able to do some of these transformations depending on the optimization level.
The one very important thing that often get destroyed by compiler using associativity is the TwoSum Error free transform which is a vital composant of several algorithms that deal with numerical error (most notably the Kahan Summation).