|
|
|
|
|
by reality_czech
3925 days ago
|
|
I don't understand the obsession with bitwise reproducibility. It seems like if your algorithm is numerically stable and valid, you can just compare multiple inexact test runs with some margin of error. Even if you hire nothing but guru-level IEEE floating point experts, a focus on bitwise reproducibility will close off a lot of opportunities to parallelize the code. A lot of machine learning tools like random forests and neural networks inherently inject randomness. Are we just going throw up our hands and say we only use classical deterministic algorithms run in single threaded mode, because we can't think of any way to compare multiple test runs except memcmp? Because that's what I'm hearing (maybe I'm missing something). |
|
But parallel code is how one continues to benefit from Moore's law w/r to multiple cores and ever-increasing SIMD width and SIMD units. And it happened at exactly the same time as the migration to mostly single-threaded weakly-typed languages began.
For if your results aren't reproducible, there's no way to detect if your code has a race condition that is reducing the efficacy of your methods.
For an application like molecular dynamics, it's important to conserve overall energy. Any such inconsistency is the equivalent of setting off tiny little hand grenades in the simulation. Inconsistent summation obscures this without a great deal of work to sample many independent simulations. Compare and contrast to running things twice to sniff this out in deterministic code.
For machine learning, it can amount to anything from a harmless implicit regularizer to the AI equivalent of Gary Busey taking the wheel and driving you straight to crazy town.
I speak from experience in both cases. And speaking from experience, as long as your reductions are associative, you'll be fine. That can be achieved with fixed point atomics, or if you don't have them, reduction buffers, or finally, a deterministic reduction algorithm if you have neither. I've used all of the above and they have at most cost 2-3% more than the non-deterministic alternatives (usually much less).
Finally, I inject randomness like crazy, but I do so in a reproducible manner. Confusing determinism and randomness reminds me of people who don't understand the difference between precision and accuracy (TLDR: precision is easy, accuracy is tough).