|
|
|
|
|
by jmalicki
62 days ago
|
|
Floating point associativity differences can lead to non-determinism with 0 temperature if the order of operations are non-deterministic. Anyone with reasonable experience with GPU computation who pays attention knows that even randomness in warp completion times can easy lead to non-determinism due to associativity differences. For instance:
https://www.twosigma.com/articles/a-workaround-for-non-deter... It is very well known that CUDA isn't strongly deterministic due to these factors among practitioners. Differences in batch sizes of inference compound these issues. Edit: to be more specific, the non-determinism mostly comes from map-reduce style operations, where the map is deterministic, but the order that items are sent to the reduce steps (or how elements are arranged in the tree for a tree reduce) can be non-deterministic. |
|