Hacker News new | ask | show | jobs
by reality_czech 3925 days ago
I don't understand the obsession with bitwise reproducibility. It seems like if your algorithm is numerically stable and valid, you can just compare multiple inexact test runs with some margin of error. Even if you hire nothing but guru-level IEEE floating point experts, a focus on bitwise reproducibility will close off a lot of opportunities to parallelize the code.

A lot of machine learning tools like random forests and neural networks inherently inject randomness. Are we just going throw up our hands and say we only use classical deterministic algorithms run in single threaded mode, because we can't think of any way to compare multiple test runs except memcmp? Because that's what I'm hearing (maybe I'm missing something).

2 comments

Moore's Law w/r to clock frequency ran out over a decade ago. And that sucks, because it's trivial to make single-threaded code reproducible.

But parallel code is how one continues to benefit from Moore's law w/r to multiple cores and ever-increasing SIMD width and SIMD units. And it happened at exactly the same time as the migration to mostly single-threaded weakly-typed languages began.

For if your results aren't reproducible, there's no way to detect if your code has a race condition that is reducing the efficacy of your methods.

For an application like molecular dynamics, it's important to conserve overall energy. Any such inconsistency is the equivalent of setting off tiny little hand grenades in the simulation. Inconsistent summation obscures this without a great deal of work to sample many independent simulations. Compare and contrast to running things twice to sniff this out in deterministic code.

For machine learning, it can amount to anything from a harmless implicit regularizer to the AI equivalent of Gary Busey taking the wheel and driving you straight to crazy town.

I speak from experience in both cases. And speaking from experience, as long as your reductions are associative, you'll be fine. That can be achieved with fixed point atomics, or if you don't have them, reduction buffers, or finally, a deterministic reduction algorithm if you have neither. I've used all of the above and they have at most cost 2-3% more than the non-deterministic alternatives (usually much less).

Finally, I inject randomness like crazy, but I do so in a reproducible manner. Confusing determinism and randomness reminds me of people who don't understand the difference between precision and accuracy (TLDR: precision is easy, accuracy is tough).

I've seen race conditions that only happen 1 time in 1000. Or race conditions that only happen when the machine is under load. I agree that it's frustrating to have to debug non-deterministic code, but I feel like there must be a better way to compare test runs than memcmp on the resutls.

You are right that deterministic pseudo-random numbers can be used when "random" inputs are required. I overlooked that... thanks for pointing it out.

One way I addressed this exact situation: Compute a hash on the state after each iteration and save it. Now run again normally and save several iterations worth of state somewhere. When the hash diverges, you have hit the race condition.

Do this a few times until you've characterized what's happening and your brain figures out how and why by seeing the pattern this will (hopefully) reveal.

Dunno from obsession, but I like having a regression suite available, and bitwise reproducible means you can run the test suite as if it were a "diff", and no output means it passed.