Hacker News new | ask | show | jobs
by jcranmer 499 days ago
It's worth noting that the C standard explicitly disclaims correct rounding for these IEEE 754 functions (C23§F.3¶20).

Also, there's a group of people who have been running tests on common libms, reporting their current accuracy states here: https://members.loria.fr/PZimmermann/papers/accuracy.pdf (that paper is updated ~monthly).

2 comments

From the referenced slides[1]:

The 2012 discovery of the Higgs boson at the ATLAS experiment in the LHC relied crucially on the ability to track charged particles with exquisite precision (10 microns over a 10m length) and high reliability (over 99% of roughly 1000 charged particles per collision correctly identified).

In an attempt to speed up the calculation, researchers found that merely changing the underlying math library (which should only affect at most the last bit) resulted in some collisions being missed or misidentified.

I was a user contributing to the LHC@Home BOINC project[2], where they ran into similar problems. They simulated beam stability, so iterated on the position of the simulated particles for millions of steps. As normal in BOINC each work unit is computed at least three times and if the results don't match the work unit is queued for additional runs.

They noticed that they got a lot of work units that failed the initial check compared to other BOINC projects. Digging into it they noticed that if a work unit was computed by the same CPU manufacturer, ie all Intel CPUs, then they passed as expected. But if the work unit had been processed by mixed CPUs, ie at least one run on Intel and one run on AMD, they very often disagreed.

That's when they discovered[3] this very issue about how the rounding of various floating point functions differed between vendors.

After switching to crlibm[4] for the elementary functions they used the mixed-vendor problem went away.

[1]: https://www.davidhbailey.com/dhbtalks/dhb-icerm-2020.pdf

[2]: https://en.wikipedia.org/wiki/LHC@home

[3]: https://accelconf.web.cern.ch/icap06/papers/MOM1MP01.pdf

[4]: https://ens-lyon.hal.science/ensl-01529804/file/crlibm.pdf

That Zimmermann paper is far more useful than the article. Notably LLVM libm is correctly rounded for most single precision ops.

Notable omission are crlibm/rlibm/core-math etc libs which claim to be more correct, but I suppose we can already be pretty confident about them.

CORE-MATH is working directly with LLVM devs to get their routines to the LLVM libm, so no additional column is really required.
I've found that LLVM is slightly worse than GCC on reproducibility once you get past the basic issues in libm. For example, LLVM will incorrectly round sqrt on ppc64el in some unusual circumstances:

https://github.com/J-Montgomery/rfloat/blob/8a58367db32807c8...

On the ppc64 thing, some ways to avoid it: https://godbolt.org/z/fx88rYz5v
...on -ffast-math. Of course you'll have arbitrary behavior of (at least) a couple ULPs on -ffast-math, but it'll be faster (hopefully)! That's, like, the flag's whole idea.
I didn't say it was a bug for exactly that reason, but as the comment explains this doesn't affect any platform or compiler combination except ppc64 LLVM, or even most usages of sqrt for that target.

LLVM has lots of small reproducibility issues like this that GCC doesn't, but also much better documentation around its limitations. The point of this library is to eliminate as many of those as possible without performance costs.

"doesn't affect other things" is in no way ever whatsoever a reason to do anything in any way related to thinking, believing, or hoping that it might, would, or should affect nothing, especially around compiler optimizations (fun sentence to write, and one I violate myself for some things, but trivially true regardless). I'd be curious to hear about actual issues though.

The ppc64 case looks like llvm very intentionally not using the existing square root instruction, instead emitting a sequence of manual operations that supposedly run faster. And it's entirely in its right to do so, and it should affect no correct code (not that it's even really possible to write "correct code" under -ffast-math).

Note that this isn't even a case where a single sqrt() call is non-reproducible, but only sequences of inlined sqrt calls within an unrolled loop and I agree with you.

Some broader context is probably warranted though. This originated out of a discussion with the authors of P3375 [0] about the actual performance costs of reproducibility. I suspected that existing compilers could already do it without a language change and no runtime cost using inline assembly magic. This library was the experiment to see if I was full of it.

There were only a few minor limitations it found. One was this issue, which happens "outside" what the library is attempting to do (though potentially still fixable as your godbolt link demonstrated). Another was that Clang and GCC have slightly different interpretations of "creative" register constraints. Clang's interpretation is closer to GCC's docs than GCC itself, but produces worse code.

Otherwise, this gives you reproducibility right up to the deeper compiler limitations like NaN propagation at essentially no performance cost. I wasn't able to find any "real" cases where it's not reproducible, only incredibly specific situations like this one across all 3 major compilers and even the minor ones I tried.

[0] https://isocpp.org/files/papers/P3375R2.html

The current correctly rounded libraries work well with 32-bit float, but only provably produce correct rounding with 64-bit float in some cases. It turns out it's rather difficult to prove that you will produce a correctly rounded result in all cases, even if you have an algorithm that works. That means that each of these libraries has only a sunset of operations.