Hacker News new | ask | show | jobs
by zokier 1256 days ago
Note that with gcc/clang you can control the auto-use of fma with compile flags (-ffp-contract=off). It is pretty crazy imho that gcc defaults to using fma
3 comments

> It is pretty crazy imho that gcc defaults to using fma

Yes! Different people can make different performance-vs-correctness trade-offs, but I also think reproducible-by-default would be better.

Fortunately, specifying a proper standard (e.g. -std=c99 or -std=c++11) implies -ffp-contract=off. I guess specifying such a standard is probably a good idea independently when we care about reproducibility.

Edit: Thinking about it, it the days of 80-bit x87 FPUs, strictly following the standard (specifically, always rounding to 64 bits after every operation) may have been prohibitively expensive. This may explain gcc's GNU mode defaulting to -ffast-math.

GCC doesn't default to non-conforming behaviour like -ffast-math -- that's Intel (at least a similar option). That's usually why people mistakenly think GCC vectorization is deficient if they don't use -funsafe-math-optimizations in particular.
Indeed GCC does not enable -ffast-math by default. Unfortunately, -ffast-math and -funsafe-math-optimizations (despite the name) are not the only options that prevent bit-for-bit-reproducible floating point. For example, -ffp-contract=fast is enabled by default [1], and it will lead to different floating-point roundings: Compare [2] which generates an FMA instruction, to [3] when -std=c99 is specified. As another example, -fexcess-precision=fast is also enabled by default. Similarly, [4] does intermediate calculations in the 80-bit x87 registers, while [5] has additional loads and stores to reduce the precision of intermediate results to 64 bits. In both examples, GCC generates code that does not conform to IEEE-754, unless -std=c99 is specified.

[1] From the man page:

    -ffp-contract=style
           -ffp-contract=off disables floating-point expression
           contraction.  -ffp-contract=fast enables floating-point
           expression contraction such as forming of fused multiply-
           add operations if the target has native support for them.
           -ffp-contract=on enables floating-point expression
           contraction if allowed by the language standard.  This is
           currently not implemented and treated equal to
           -ffp-contract=off.
           
           The default is -ffp-contract=fast.
[2] https://godbolt.org/z/GKb7G4nW9

[3] https://godbolt.org/z/KTnqcT6aW

[4] https://godbolt.org/z/4q31oEe14

[5] https://godbolt.org/z/qdf4hceca

> Edit: Thinking about it, it the days of 80-bit x87 FPUs, strictly following the standard (specifically, always rounding to 64 bits after every operation) may have been prohibitively expensive

afaik you could just set the precision of x87 to 32/64/80 bits and there would not be any extra cost to the operations

Why is it crazy? Some of us don't want to lose a factor of two on linear algebra (and also care about correctness). I remember testing for correctness against Kahan's tests after FMA became available in RS/6000.
If you do want the precision improvement of fma, then it makes far more sense to explicitly call fma instead of relying compiler on doing transformation that might not happen for any number of reasons. The key here is predictability, if it was actually guaranteed that expressions in the form of (x*y) + z are always done with fma, then it'd be less crazy. But now you have no way of knowing without looking at the produced assembly if fma is used or not in any particular expression.
As I implied, we're normally interested in FMA for speed, not numerical properties. I don't know in what circumstances GCC wouldn't use it when vectorizing, but I haven't seen them.
My take is not using the higher precision operation is crazy.
If you want higher precision then long double exists for that purpose