| > Unfortunately compilers can't emit FMA instructions from regular floating point math without specifying (an equivalent of) `-ffast-math` ... if you specify an ISO language standard (e.g. -std=c99). By default, in GNU mode, gcc will happily emit FMA at -O3. From the man page: By default, -fexcess-precision=fast is in effect; this means that operations may be carried out in a wider precision than the types specified in the source if that would result in faster code, and it is unpredictable when rounding to the types specified in the source code takes place. [...] [-fexcess-precision=standard] is enabled by default for C if a strict conformance option such as -std=c99 is used. > Unfortunately compilers can't emit FMA instructions without specifying [-ffast-math] I would argue that it would have been fortunate if FMA was disabled by default. And that it is unfortunate that it is not. Yes, FMA has a performance boost and (usually) better accuracy... But it comes at the cost of bit-for-bit reproducibility. If we let the compiler automatically decide when to apply FMA, then a compiler upgrade or even unrelated code changes could lead to slightly different results as rounding is performed at different points in the computation. For numerical codes in which small perturbations can yield wildly different solution paths, debugging can become a nightmare. Of course this is a subjective preference and it strongly depends on the application domain. I agree that for some people defaulting to best performance at -O3 is the right choice. |
I had no idea that gcc was in 'GNU mode' by default, and that specifying -std would turn that off. I always assumed it just had a default standard version that is (very) irregularly incremented.
> I would argue that it would have been fortunate if FMA was disabled by default.
I agree, and (outside of my earlier ignorance of GNU mode) it is most everywhere.
My 'unfortunate' wasn't aimed at compilers per se, but rather at (unavoidable) the non-commutative and associative nature of floating point. I do wish that it was easier to specify at a per-file or per-function level that emitting FMA / performing algebraic and other non-bit-for-bit reproducible optimizations is ok.