|
|
|
|
|
by hansvm
1152 days ago
|
|
Determinism in the LSB is often a prerequisite for epsilon-determinism in the result. Algorithms with many possible solutions and any step that bifurcates on a floating point value should be treated with suspicion. Classic examples include most under-constrained randomized algorithms, like training a neural network. Rejection sampling is required to accurately produce some sorts of randomness, and that yields bifurcations in the initialization if you don't have LSB determinism. The complicated loss landscape then virtually guarantees you'll converge to a different minima. Even with a deterministic seed, algorithms guaranteed to converge, a principled way to ensure that concurrent computations yield bitwise identical results no matter the execution order, and most of your operations being bitwise identical, a few stray LSB issues in your inverse square roots or transcendentals will still nearly ensure that the final result isn't even approximately the same. As to why that latter thing matters, it varies, but at a minimum reproducibility makes lots of federated processes cheaper (and not just federated in the "folding at home" sense, but generally when some people are performing computations and other people are making actions based on them -- being able to explain credit scores or parole denials or whatever, or validating that several people you trust yield the same compiled binary). Bitwise reproducibility would be better, but even being approximately right would probably be good enough and isn't tenable without bitwise identical building blocks. |
|