| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by david-gpu 455 days ago

During my tenure at NVidia I met a guy that was working on special versions of to the kernels that would make them deterministic.

Otherwise, parallel floating point computations like these are not going to be perfectly deterministic, due to a combination of two factors. First, the order of some operations will be random due to all sorts of environmental conditions such as temperature variations. Second, floating point operations like addition are not ~~commutative~~ associative (thanks!!), which surprises people unfamiliar with how they work.

That is before we even talk about the temperature setting on LLMs.

1 comments

enriquto 455 days ago

> floating point operations like addition are not commutative

maybe you meant associative? Floating point addition is commutative: a+b is always equal to b+a for any values of a and b. It is not associative, though: a+(b+c) is in general different to (a+b)+c, think what happens if a is tiny and b,c are huge, for example.

link

david-gpu 455 days ago

Sorry, yes, I meant associative. Thanks for the important correction.

To think that I used to do this for a living...

link

simulator5g 455 days ago

How is that any different? 1+(2+3) = 6

(1+2)+3 = 6

0.000001+(200000+300000) = 500000.000001

(0.000001+200000)+300000 = 500000.000001

link

david-gpu 455 days ago

You need to take it a step further, since e.g. 64-bit floats have a ton of mantissa bits.

Here's an example in python3.

    >>> "{:.2f}".format(1e16 + (1 + 1))
    '10000000000000002.00'
    >>> "{:.2f}".format((1e16 + 1) + 1)
    '10000000000000000.00'

link

enriquto 454 days ago

take b and c with opposite signs

link