Hacker News new | ask | show | jobs
by david-gpu 408 days ago
During my tenure at NVidia I met a guy that was working on special versions of to the kernels that would make them deterministic.

Otherwise, parallel floating point computations like these are not going to be perfectly deterministic, due to a combination of two factors. First, the order of some operations will be random due to all sorts of environmental conditions such as temperature variations. Second, floating point operations like addition are not ~~commutative~~ associative (thanks!!), which surprises people unfamiliar with how they work.

That is before we even talk about the temperature setting on LLMs.

1 comments

> floating point operations like addition are not commutative

maybe you meant associative? Floating point addition is commutative: a+b is always equal to b+a for any values of a and b. It is not associative, though: a+(b+c) is in general different to (a+b)+c, think what happens if a is tiny and b,c are huge, for example.

Sorry, yes, I meant associative. Thanks for the important correction.

To think that I used to do this for a living...

How is that any different? 1+(2+3) = 6

(1+2)+3 = 6

0.000001+(200000+300000) = 500000.000001

(0.000001+200000)+300000 = 500000.000001

You need to take it a step further, since e.g. 64-bit floats have a ton of mantissa bits.

Here's an example in python3.

    >>> "{:.2f}".format(1e16 + (1 + 1))
    '10000000000000002.00'
    >>> "{:.2f}".format((1e16 + 1) + 1)
    '10000000000000000.00'
take b and c with opposite signs