As I understand it, it's more that the mathematical functions aren't specified by the standard.
There is no sqrt instruction. So you stard with a taylor approximation, then maybe do a few rounds of Newton's method to refine the answer. But what degree is the taylor series? Where did you centre it (can't be around 0)? How many Newton's methods did you do? IEEE 754 might dictate what multiplication looks like, but for sqrt, sin, cos, tan you need to come up with a method of calculation, and that's in the standard library which usually comes from the compiler. You could make your own implementations but...
Floats are not associative: (a+b)+c != a+(b+c). So even something like reordering fundamental operations can cause divergence.
> There is no sqrt instruction. So you stard with a taylor approximation, then maybe do a few rounds of Newton's method to refine the answer. But what degree is the taylor series? Where did you centre it (can't be around 0)? How many Newton's methods did you do? IEEE 754 might dictate what multiplication looks like, but for sqrt, sin, cos, tan you need to come up with a method of calculation, and that's in the standard library which usually comes from the compiler. You could make your own implementations but...
This is just plain wrong. IEEE 754 defines many common mathematical functions, sqrt and trig included, and recommends them to return correct values to the last ulp. Most common cpus have hardware instructions for those, although Intels implementation is notoriously bad.
Furthermore there are some high-quality FP libraries out there, like SLEEF and rlibm, and CORE-MATH project that aims to improve the standard libraries in use.
- sqrt actually can be a single instruction on x86_64 for example. Look at how musl implements it, it's just a single line inline asm statement, `__asm__ ("sqrtsd %1, %0" : "=x"(x) : "x"(x));`: https://github.com/ifduyue/musl/blob/master/src/math/x86_64/... Of course, not every x64_64 impl must use that instruction, and not every architecture must match Intel's implementation. I've never looked into it, but wouldn't be surprised if even Intel and AMD have some differences.
- operator precedence is well-defined, and IEE754 compliant compilation modes respect the non-associativity. In most popular compilers, you need to pass specific flags to allow the compiler to change the associativity of operations (-ffast-math implies -funsafe-math-optimizations which implies -fassociative-math which actually breaks strict adherence to the program text).
A somewhat similar issue does arise with floating point environment though, as statements may be reordered, function arguments order of evaluation may differ, etc.
The fact that compilers respect the non-associativity of program text is a huge reason why compilers are very limited in how much auto-vectorization they will do for floating point. The classic example is a sum of squares, where it bars itself from even loop unrolling, never mind SIMD with FMADDs. To do any of that, you have to fiddle with compiler options that are often problematic when enabled globally, or __attribute__((optimize("...")) or __pragma(optimize("...")), or probably best of all, explicitly vectorize using intrinsics.
Ah... I see. The game is using a software implementation of floating point maths to avoid this issue :). Smart. That in combination with the unified compiler should get you out of most of the trouble.
Float determinism is one of those things that in theory floats should behave deterministically, but in practice it's a wild west. One big problem is that in C (and in many others), common operators are not explicitly mapped to specific fp operations, which gives compilers quite a lot of leeway on what they can do.
Another issue is that FP uses some amount of invisible global state, e.g. rounding modes. Iirc for example DirectX likes to change the flags behind your back.
There are lots of different ways floating point determinism can get lost. The obvious one, as you mentioned, is the 80 bit x87 unit. Lots of 32 bit compilers will compile to doing floating point math on the x87, which is slow, but the same compiler compiling in 64 bit mode will use SSE2 instructions.
Floating point arithmetic is not associative. That is, (a+b)+c != a+(b+c) and (a*b)*c != a*(b*\c).
Multiplying by the reciprocal is not equal to dividing by the value. That is, x*(1/y) != x/y. An optimizing compiler may, when you attempt to divide by a constant, will optimize that to multiplying the reciprocal instead, that is, if you have code that divides by the constant 3 it will multiply by 0.33333333333333333, because it's a lot faster.
FMA (fused multiply-add) instructions are more precise and therefore not equal to the same calculation without FMA. That is, a*b+c != a*b+c if one compiler will output FMA instructions and the other one does not. (this will be true even in the same compiler with different flags)
Special functions are fucky. sqrt, sin, cos etc might not always give equal values. Or even in the same compiler if minor alterations to the code are made. A compiler might use one algorithm to compute sin(x), but a different algorithm if it needs to compute both sin(x) and cos(x) at the same time.
Floating point rounding mode is a thing. Sometimes a plugin changes your floating point rounding mode. Sometimes this plugin will be inserted into your runtime without your knowledge, such as an antivirus program, or malware that hijacks your browser to give "better"/"customized" shopping/search recommendations. There was a bug writeup about a crash in Chrome several years back, but I can't find it.
Basically you should assume that floating point math is non-deterministic. If you think you need deterministic floating point math, try to reformulate the problem so that you don't need deterministic floating point math. If you really* need deterministic floating point math, understand that you're signing up for a lot of pain.
AFAIK, the first three things you listed would be deterministic across compilers unless you enabled -ffast-math (which isn't what this is talking about) precisely for that reason. I believe the same applies to intrinsics and rounding modes but not sure, especially since the website talks about GCC vs clang differences for the latter.
Constant folding of floats may depend on the compiler of compiler (e.g. 0.5 + x + 0.6 + 0.7), the compiler may be non-deterministic (std::unordered_map<float> constants), and it may depend on the CPU your compiler is running on.
There is no sqrt instruction. So you stard with a taylor approximation, then maybe do a few rounds of Newton's method to refine the answer. But what degree is the taylor series? Where did you centre it (can't be around 0)? How many Newton's methods did you do? IEEE 754 might dictate what multiplication looks like, but for sqrt, sin, cos, tan you need to come up with a method of calculation, and that's in the standard library which usually comes from the compiler. You could make your own implementations but...
Floats are not associative: (a+b)+c != a+(b+c). So even something like reordering fundamental operations can cause divergence.