Hacker News new | ask | show | jobs
by thebooktocome 1040 days ago
Strictfp is no longer used as of JVM 17. IEEE-754 semantics are default.

https://openjdk.org/jeps/306

Ironically, using your system’s math libraries will probably make replicating floating point results harder. Especially on macOS. Accelerate is a curse.

I disagree with your gloss of Kahan’s philosophy. His approach is more along the lines of “do not waste precision”. But this philosophy is not the complete truth; as close as I can state it briefly, my modification would be “do not waste precision that may be needed later”.

4 comments

Attempting to get bit-exact reproducible results across different hardware is a fool's errand (if you care in the least about performance).

The nature of the beast is that as soon as you change the order of arithmetic you're going to get a different result. Optimized code is going to give you different results on different hardware due to the fact that you need to optimize things differently. Threading, memory alignment and/or different versions of the library software are likely to lead to different results even on the same machine unless the authors of the library go out of the way to promise repeatability.

(If you want to get the same answer, run on a single thread, page align everything you feed in, and never upgrade your system; alternatively write a scalar loop in C, compile with -O0 and pray the compiler doesn't change the order of things on its next upgrade).

> Attempting to get bit-exact reproducible results across different hardware is a fool's errand (if you care in the least about performance).

We did it for Wasm, which follows IEEE-754 semantics exactly for 32-bit and 64-bit floats. (The only nondeterminism is the exact bit pattern you get for NaNs in some circumstances.) Rounding is 100% well-specified. And CPUs have done that for decades. Even vector ISAs have learned that non-IEEE results are not what software wants; all vector ISAs are converging on IEEE-754.

> Optimized code is going to give you different results on different hardware due to the fact that you need to optimize things differently.

This is due to C/C++ (and to some extent Fortran) semantics. It is not hardware.

What do threads have to do with floating point precision?

Oh, it's entirely possible to get bit-reproducible results. Just not in a performance portable fashion.

Different microarchitectures (e.g. how many vector instructions of what size need to be in flight for full occupancy), different numbers of cores (see threading discussion below) and often even differently aligned memory (does it need repacked or not for best performance?) will all require different order of operations to obtain maximum throughput, which means different (but equally valid) results.

For threading in particular if you want to get the same bit-exact answer, you end up constraining yourself to a particular ordering on reduction operations. This in turn either outright prevents techniques such as work-stealing or fires a very prescriptive reduction tree that itself constrains parallelism.

This is entirely driven by hardware and its impacts on performance of algorithms, and applies regardless of the language you're writing in if you want to obtain the best possible performance from a given chip.

I’m not the parent but I imagine they’re referring to e.g., some FFTs use different partitioning strategies in different threading environments, which breaks bit-perfect replication.

There’s also the weirdness that in C++ the floating point environment is thread-local, which can cause all sorts of chaos.

...or use fixed-point arithmetic. Which, if I understand correctly, is basically the go-to of modern multiplayer-enabled game engines.
The only reason you would want bit-reproducibility is because you haven't done the numerical analysis and have no clue how many digits of your "answer" to trust.

As far as I know, two sectors claim they need it: finance and climate.

"Do you want a better answer?"

"No, I want the same wrong answer that I got last Tuesday."

Science/Mathematics can't fix this.

> The only reason you would want bit-reproducibility is because you haven't done the numerical analysis and have no clue how many digits of your "answer" to trust.

I can confidently say that this is not the only good reason. Other reasons include:

- You want to compare different runs by hashing outputs (e.g. to find the first computation step where they diverged). Very useful for debugging, and also useful to determine whether you accurately reproduced a result (e.g. a customer problem).

- If your program has a single floating point comparison, there is no such thing as "enough significant digits" - with reasonable assumptions about the distribution of "unreproducability", your logic is now divergent (and your output will jump between different values) with a certain probability. At that point we're no longer talking numerical analysis, it's straight up "divergent results".

There's also "cover your ass". At least I've heard tales of major aerospace companies keeping warehouses of old sun hardware in case they need to demonstrate the simulations they ran back in the 90s were not fabricated...
I’ve yet to meet a customer that cares enough to pay for the necessary numerical analysis.
> It is not a goal to define any sort of "fast-fp" or "loose-fp" (c.f. JSR 84: Floating Point Extensions).

(Comic villain sitting in his fast-math lair) Foiled yet again!

I've long since decided that my write up on floating point will be titled "Floating Point or: How I Learned to Start Worrying and Hate Fast-Math".
As a fellow practitioner of the demonic -ffast-math arts, let us cackle menacingly together. :)
Did you mean -fbroken-and-not-necessarily-fast-math? [1]

[1] But really, if -ffast-math does turn -funsafe-math-optimizations on, it should have been named similarly. There is a possibility of much safer -ffast-math with almost zero breakage (by assuming a subset of IEEE 754, like the fixed rounding mode). The current -ffast-math is so reckless [2].

[2] https://simonbyrne.github.io/notes/fastmath/#flushing_subnor...

I find that -ffast-math is not so bad, so long as I develop and test with it from the beginning. It's much like any of the other more aggressive optimizations in that sense.

Plus, comparing against strict math as I go tends to highlight where I might have been about to do something dodgy anyway.

I was unaware that Java finally got rid of strictfp (though happy that it finally did). It was added in Java 1.2, perhaps in response to this paper, though I don't know the entire timeline accurately.
though you imply the contrary, the 8087 is ieee-754-compliant; kahan's involvement in its design and in the standards process ensured that
Where did I imply that? The fault was with the JVM.

“The impetus for changing the default floating-point semantics of the platform in the late 1990's stemmed from a bad interaction between the original Java language and JVM semantics and some unfortunate peculiarities of the x87 floating-point co-processor instruction set of the popular x86 architecture.”

The x87 ISA was an absolutely boondoggle which took 3 decades to recover from. It was so clearly not designed by CPU architects. What a disaster.
glad we've cleared that up