Hacker News new | ask | show | jobs
by teebs 1256 days ago
It's likely because GPU calculations are non-deterministic and small differences in floating point numbers could lead to different outcomes (either in the way you described or somewhere deeper in the model)
1 comments

> GPU calculations are non-deterministic

Tensorflow is non-deterministic for some operations due to thread scheduling. PyTorch doesn’t have this issue.

Some of the underlying CuDNN algorithms have nondeterministic implementations, which applies to PyTorch as well. See https://pytorch.org/docs/stable/notes/randomness.html