Hacker News new | ask | show | jobs
by guyomes 1256 days ago
Example 4 mentions that the result might be different with the same code. Here is an example that is particularly counter-intuitive.

Some CPU have the instruction FMA(a,b,c) = ab + c and it is guaranteed to be rounded to the nearest float. You might think that using FMA will lead to more accurate results, which is true most of the time.

However, assume that you want to compute a dot product between 2 orthogonal vectors, say (u,v) and (w,u) where w = -v. You will write:

p = uv + wu

Without FMA, that amounts to two products and an addition between two opposite numbers. This results in p = 0, which is the expected result.

With FMA, the compiler might optimize this code to:

p = FMA(u, v, wu)

That is one FMA and one product. Now the issue is that wu is rounded to the nearest float, say x, which is not exactly -vu. So the result will be the nearest float to uv + x, which is not zero!

So even for a simple formula like this, testing if two vectors are orthogonal would not necessary work by testing if the result is exactly zero. One recommended workaround in this case is to test if the dot product has an absolute value smaller than a small threshold.

3 comments

Note that with gcc/clang you can control the auto-use of fma with compile flags (-ffp-contract=off). It is pretty crazy imho that gcc defaults to using fma
> It is pretty crazy imho that gcc defaults to using fma

Yes! Different people can make different performance-vs-correctness trade-offs, but I also think reproducible-by-default would be better.

Fortunately, specifying a proper standard (e.g. -std=c99 or -std=c++11) implies -ffp-contract=off. I guess specifying such a standard is probably a good idea independently when we care about reproducibility.

Edit: Thinking about it, it the days of 80-bit x87 FPUs, strictly following the standard (specifically, always rounding to 64 bits after every operation) may have been prohibitively expensive. This may explain gcc's GNU mode defaulting to -ffast-math.

GCC doesn't default to non-conforming behaviour like -ffast-math -- that's Intel (at least a similar option). That's usually why people mistakenly think GCC vectorization is deficient if they don't use -funsafe-math-optimizations in particular.
Indeed GCC does not enable -ffast-math by default. Unfortunately, -ffast-math and -funsafe-math-optimizations (despite the name) are not the only options that prevent bit-for-bit-reproducible floating point. For example, -ffp-contract=fast is enabled by default [1], and it will lead to different floating-point roundings: Compare [2] which generates an FMA instruction, to [3] when -std=c99 is specified. As another example, -fexcess-precision=fast is also enabled by default. Similarly, [4] does intermediate calculations in the 80-bit x87 registers, while [5] has additional loads and stores to reduce the precision of intermediate results to 64 bits. In both examples, GCC generates code that does not conform to IEEE-754, unless -std=c99 is specified.

[1] From the man page:

    -ffp-contract=style
           -ffp-contract=off disables floating-point expression
           contraction.  -ffp-contract=fast enables floating-point
           expression contraction such as forming of fused multiply-
           add operations if the target has native support for them.
           -ffp-contract=on enables floating-point expression
           contraction if allowed by the language standard.  This is
           currently not implemented and treated equal to
           -ffp-contract=off.
           
           The default is -ffp-contract=fast.
[2] https://godbolt.org/z/GKb7G4nW9

[3] https://godbolt.org/z/KTnqcT6aW

[4] https://godbolt.org/z/4q31oEe14

[5] https://godbolt.org/z/qdf4hceca

> Edit: Thinking about it, it the days of 80-bit x87 FPUs, strictly following the standard (specifically, always rounding to 64 bits after every operation) may have been prohibitively expensive

afaik you could just set the precision of x87 to 32/64/80 bits and there would not be any extra cost to the operations

Why is it crazy? Some of us don't want to lose a factor of two on linear algebra (and also care about correctness). I remember testing for correctness against Kahan's tests after FMA became available in RS/6000.
If you do want the precision improvement of fma, then it makes far more sense to explicitly call fma instead of relying compiler on doing transformation that might not happen for any number of reasons. The key here is predictability, if it was actually guaranteed that expressions in the form of (x*y) + z are always done with fma, then it'd be less crazy. But now you have no way of knowing without looking at the produced assembly if fma is used or not in any particular expression.
As I implied, we're normally interested in FMA for speed, not numerical properties. I don't know in what circumstances GCC wouldn't use it when vectorizing, but I haven't seen them.
My take is not using the higher precision operation is crazy.
If you want higher precision then long double exists for that purpose
In general with reals with any source of error anywhere, this caution about equality is always correct. the odds of two reals being equal is zero.
I have an exception that proves the rule. I thought about responding to Julia's call, but decided this was too subtle. But here we go...

A central primitive in 2D computational geometry is the orientation problem; in this case deciding whether a point lies to the left or right of a line. In real arithmetic, the classic way to solve it is to set up the line equation (so the value is zero for points on the line), then evaluate that for the given point and test the sign.

The problem is of course that for points very near the line, roundoff error can give the wrong answer, it is in fact an example of cancellation. The problem has an exact answer, and can be solved with rational numbers, or in a related technique detecting when you're in the danger zone and upping the floating point precision just in those cases. (This technique is the basis of Jonathan Shewchuk's thesis).

However, in work I'm doing, I want to take a different approach. If the y coordinate of the point matches the y coordinate of one of the endpoints of the line, then you can tell orientation exactly by comparing the x coordinates. In other cases, either you're far enough away that you know you won't get the wrong answer due to roundoff, or you can subdivide the line at that y coordinate. Then you get an orientation result that is not necessarily exactly correct wrt the original line, but you can count on it being consistent, which is what you really care about.

So the ironic thing is that if you had a lint that said, "exact floating point equality is dangerous, you should use a within-epsilon test instead," it would break the reasoning outlined above, and you could no longer count on the orientations being consistent.

As I said, though, this is a very special case. Almost always, it is better to use a fuzzy test over exact equality, and I can also list times I've been bitten by that (especially in fastmath conditions, which are hard to avoid when you're doing GPU programming).

Yes, and this is not just a theoretical concern: There was an article here [1] in 2021 claiming that Apple M1's FMA implementation had "flaws". There was actually no such flaw. Instead, the author was caught off guard by the very phenomenon you are describing.

[1] https://news.ycombinator.com/item?id=27880461

Although funnily enough the Windows fma implementation is significantly flawed.