Hacker News new | ask | show | jobs
by varelse 3925 days ago
Moore's Law w/r to clock frequency ran out over a decade ago. And that sucks, because it's trivial to make single-threaded code reproducible.

But parallel code is how one continues to benefit from Moore's law w/r to multiple cores and ever-increasing SIMD width and SIMD units. And it happened at exactly the same time as the migration to mostly single-threaded weakly-typed languages began.

For if your results aren't reproducible, there's no way to detect if your code has a race condition that is reducing the efficacy of your methods.

For an application like molecular dynamics, it's important to conserve overall energy. Any such inconsistency is the equivalent of setting off tiny little hand grenades in the simulation. Inconsistent summation obscures this without a great deal of work to sample many independent simulations. Compare and contrast to running things twice to sniff this out in deterministic code.

For machine learning, it can amount to anything from a harmless implicit regularizer to the AI equivalent of Gary Busey taking the wheel and driving you straight to crazy town.

I speak from experience in both cases. And speaking from experience, as long as your reductions are associative, you'll be fine. That can be achieved with fixed point atomics, or if you don't have them, reduction buffers, or finally, a deterministic reduction algorithm if you have neither. I've used all of the above and they have at most cost 2-3% more than the non-deterministic alternatives (usually much less).

Finally, I inject randomness like crazy, but I do so in a reproducible manner. Confusing determinism and randomness reminds me of people who don't understand the difference between precision and accuracy (TLDR: precision is easy, accuracy is tough).

1 comments

I've seen race conditions that only happen 1 time in 1000. Or race conditions that only happen when the machine is under load. I agree that it's frustrating to have to debug non-deterministic code, but I feel like there must be a better way to compare test runs than memcmp on the resutls.

You are right that deterministic pseudo-random numbers can be used when "random" inputs are required. I overlooked that... thanks for pointing it out.

One way I addressed this exact situation: Compute a hash on the state after each iteration and save it. Now run again normally and save several iterations worth of state somewhere. When the hash diverges, you have hit the race condition.

Do this a few times until you've characterized what's happening and your brain figures out how and why by seeing the pattern this will (hopefully) reveal.