|
|
|
|
|
by david-gpu
408 days ago
|
|
During my tenure at NVidia I met a guy that was working on special versions of to the kernels that would make them deterministic. Otherwise, parallel floating point computations like these are not going to be perfectly deterministic, due to a combination of two factors. First, the order of some operations will be random due to all sorts of environmental conditions such as temperature variations. Second, floating point operations like addition are not ~~commutative~~ associative (thanks!!), which surprises people unfamiliar with how they work. That is before we even talk about the temperature setting on LLMs. |
|
maybe you meant associative? Floating point addition is commutative: a+b is always equal to b+a for any values of a and b. It is not associative, though: a+(b+c) is in general different to (a+b)+c, think what happens if a is tiny and b,c are huge, for example.