Hacker News new | ask | show | jobs
by TeMPOraL 529 days ago
There's extra randomness added accidentally in practice: inference is a massively parallelized set of matrix multiplications, and floating point math is not commutative - the randomness in execution order gets converted into a random FP error, so even setting temperature to 0 doesn't guarantee repeatable results.
1 comments

Only if the inference software doesn't guarantee concurrency, which is CS 101
This sort of nondeterministic scheduling of non associative floating point ops is essentially running at the level of GPU firmware, so, I would imagine that in this case, Nvidia is aware.