Hacker News new | ask | show | jobs
by evrimoztamur 383 days ago
Does that mean that a physics engine written with these operations will always compile to yield the same deterministic outcomes across different platforms (assuming they correctly implement (or able to do so) algebraic operations)?
3 comments

It's more like the opposite. These tell the compiler to assume for optimization purposes that floats are associative and so on (ie. algebraic), even when in reality they aren't. So the results may vary depending on what transformations the compiler performs – in particular, they may vary between optimized and non-optimized builds, which normally isn't allowed.
> These tell the compiler to assume for optimization purposes that floats are associative and so on (ie. algebraic), even when in reality they aren't.

I wonder if it is possible to add an additional constraint that guarantees the transformation has equal or fewer numerical rounding errors. E.g. for floating point doubles (0.2 + 0.1) - 0.1 results in 0.20000000000000004, so I would expect that transforming some (A + B) - B to just A would always reduce numerical error. OTOH, it's floating point maths, there's probably some kind of weird gotcha here as well.

Pretty sure that’s not possible. More accurate for some inputs will be less accurate for others. There’s a very tricky tension in float optimization that the most predictable operation structure is a fully skewed op tree, as in naive left-to-right summation, but this is the slowest and least accurate order of operations. Using a more balanced tree is faster and more accurate (great), but unfortunately which tree shape is fastest depends very much on hardware-specific factors like SIMD width (less great). And no tree shape is universally guaranteed to be fully accurate, although a full binary tree tends to have the best accuracy, but has bad base case performance, so the actual shape that tends to get used in high performance kernels is SIMD-width parallel in a loop up to some fixed size like 256 elements, then pairwise recursive reduction above that. The recursive reduction can also be threaded. Anyway, there’s no silver bullet here.
I think a restricted version might be possible to implement: only allow transformations if the transformed version has strictly fewer numerical rounding errors on some inputs. This will usually only mean canceling terms and collecting expressions like "x+x+x" into 3x.

In general, rules that allow fewer transformations are probably easier to understand and use. Trying to optimize everything is where you run into trouble.

Kahan summation is an example (also described in the top level article) of one such “gotcha”. It involves adding a term that - if floats were algebraic in this sense - would always be zero, so ffast-math often deletes it, but this actually completely removes the accuracy improvement of the algorithm
Under EForth with FP done in software:

2 f 1 f 1 f f+ f- f. 0.000 ok

PFE, I think reusing the GLIBC math library:

2e0 1e0 1e0 f+ f- f. 0.000000 ok

Every implementation of floating point can handle small to medium integers without rounding. So that example doesn't show anything we can learn from.
Buth for Forth you can have a totally working software floating point made from the people whose later would set the IEEE standard. Think about very small microcontrollers without FP in hardware, you can be sure that a software FP implementation will perfectly work under the standard thresolds.
Okay, I guess? I appreciate having a well-verified library when I can't have floating point hardware, but I'm never in that situation and I really don't see how it's relevant to this discussion. Any floating point implementation is going to have rounding problems, including IEEE. The example above is IEEE.
No, there is no guarantee which (if any) optimizations are applied, only that they may be applied. For example a fused multiply-add instruction may be emitted for a*b + c on platforms which support it, which is not cross-platform.
No, the result may depend on how the compiler reorders them, which could be different on different platforms.