Hacker News new | ask | show | jobs
by magicalhippo 1157 days ago
I recall LHC@Home had some issue with this[1]. They ran simulations of beams in the LHC using BOINC, each run could be up to a million revolutions. The goal was to do parameter searches to find the best magnet settings before going live, so they could spend less time tuning the real thing.

Anyway, as with other BOINC projects, they sent each work unit (simulation run) to at least two (or was it three?) different computers and compared results, to ensure correctness. And they found that they got quite a lot of work units which disagreed and had to be sent to more computers for validation.

After some digging and eliminating factors like overclocked CPUs, they found that usually, all Intel machines would agree and all AMDs would agree, but Intel and AMD would disagree. Like, a run that would hit the detector wall after 30 revolutions on Intels could go on for many thousands of revolutions on AMDs.

Further digging led to the discrepancy in lower bits of transcendental operations in the FPUs[2]. After switching to a software library for these operations, at the cost of a few % in speed, they got Intels and AMDs to agree.

So yeah, when you do a large number of iterated operations like this, even a single LSB of difference can lead to issues.

As an aside, the LHC@Home was initially run almost like a hobby project by a few researchers connected to the LHC, without much official support. However the data the project produced was AFAIK highly beneficial to the machine commissioning, and it later became a more official part of the High Luminocity upgrade.

[1]: https://cds.cern.ch/record/1463291/files/CERN-ATS-2012-159.p...

[2]: https://epaper.kek.jp/icap06/PAPERS/MOM1MP01.PDF

1 comments

I looked into this issue with rsqrt (and with rcp as well) between Intel and AMD processors in connection with CERN some years ago (2016). An unpublished report can be found at [1].

TL;DR: The same (very small) executables gave different results when run on Intel and AMD processors because the rsqrt and rcp instructions produced slightly different outputs on the two systems.

[1]: https://github.com/jeff-arnold/math_routines/blob/main/rsqrt....

Interesting results, thanks for sharing!