|
|
|
|
|
by owlbite
1051 days ago
|
|
Attempting to get bit-exact reproducible results across different hardware is a fool's errand (if you care in the least about performance). The nature of the beast is that as soon as you change the order of arithmetic you're going to get a different result. Optimized code is going to give you different results on different hardware due to the fact that you need to optimize things differently. Threading, memory alignment and/or different versions of the library software are likely to lead to different results even on the same machine unless the authors of the library go out of the way to promise repeatability. (If you want to get the same answer, run on a single thread, page align everything you feed in, and never upgrade your system; alternatively write a scalar loop in C, compile with -O0 and pray the compiler doesn't change the order of things on its next upgrade). |
|
We did it for Wasm, which follows IEEE-754 semantics exactly for 32-bit and 64-bit floats. (The only nondeterminism is the exact bit pattern you get for NaNs in some circumstances.) Rounding is 100% well-specified. And CPUs have done that for decades. Even vector ISAs have learned that non-IEEE results are not what software wants; all vector ISAs are converging on IEEE-754.
> Optimized code is going to give you different results on different hardware due to the fact that you need to optimize things differently.
This is due to C/C++ (and to some extent Fortran) semantics. It is not hardware.
What do threads have to do with floating point precision?