| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Tunabrain 1090 days ago
	GPUs are deterministic machines, even for floating point. The behavior in the linked article has to do with the use of atomic adds to reduce sums in parallel. Floating point addition is not associative, so the order in which addition occurs matters. When using atomic adds this way, you get slightly different results depending on the order in which threads arrive at the atomic add call. It's a simple race condition, although one which is usually deemed acceptable.

1 comments

n2d4 1090 days ago

I just edited my comment while you were writing your comment to add an explanation. The point here is that some primitives in eg. cudNN are non-deterministic. Whether you classify that as a race condition or not is a different question; but it's intended behaviour.

link

xyzzy_plugh 1090 days ago

Right but that's not an inherent GPU determinism issue. It's a software issue.

https://github.com/tensorflow/tensorflow/issues/3103#issueco... is correct that it's not necessary, it's a choice.

Your line of reasoning appears to be "GPUs are inherently non-deterministic don't be quick to judge someone's code" which as far as I can tell is dead wrong.

Admittedly there are some cases and instructions that may result in non-determinism but they are inherently necessary. The author should thinking carefully before introducing non-determinism. There are many scenarios where it is irrelevant, but ultimately the issue we are discussing here isn't the GPU's fault.

link

n2d4 1090 days ago

What I'm saying is "there are non-deterministic primitives", not "there are no deterministic primitives".

link

xyzzy_plugh 1090 days ago

Yes, and `gettimeofday` is a non-deterministic primitive. There is nothing special about GPUs here. If you write tests that fail sometimes because you used non-deterministic primitives like gettimeofday and someone files a bug we don't throw up our hands and say "this is not a bug but due to how CPUs work." We remove the non-deterministic bit.

There's no difference here. This isn't a GPU problem.

link

cpgxiii 1090 days ago

Except the issue is inextricably linked to GPUs. All of the work in practical DNNs exists because of the extreme parallel performance available from GPUs, and that performance is only possible with non-deterministic threading. You can't get reasonable training and inference time on existing hardware without it.

link

d0mine 1089 days ago

1000 threads can run in parallel. It doesn't prevent us to sum their results deterministically:

    results = ThreadPool(workers=1000).imap_unordered(calc, inputs)
    print(math.fsum(results))

Due to the magic of the fsum alg, the result is deterministic whatever order we get results in. https://docs.python.org/3/library/math.html#math.fsum

link

WanderPanda 1089 days ago

In my experience cuBLAS is deterministic, since matmul is the most intensive part I don‘t see other reasons for non-determinism other than sloppyness (at least when just a single GPU is involved)

link

DeathArrow 1089 days ago

If the hardware is deterministic, so are the results. You can't generate random numbers purely in software with deterministic hardware.

link

WithinReason 1089 days ago

The behaviour of atomic operations is definitely not deterministic. E.g. if you have a lot of atomic adds, every time you run the code you'll get a different result without a random number generator.

link