| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by aray 3225 days ago
	It looks like nvidia GPUs treat denormals as zeros for single-precision floating point math: http://developer.download.nvidia.com/assets/cuda/files/NVIDI... (sections 4.1 and 4.2)

1 comments

dbcurtis 3225 days ago

In the context of graphics processing that trade-off totally makes sense.

Thanks for doing the homework that I was too lazy to do :)

It seems to me that in the context of NN computations, using the lack of gradual underflow as a non-linear element is going to severely limit the dynamic range of the neurons. On the plus side, the non-linear element is a computational freebie. But in addition to limited dynamic range, it makes the NN ridiculously non-portable across hardware implementations.

link

scott-gray 3225 days ago

Actually if you read section 4.6 of that paper you'll see that denormals are the default on sm_20 and above. But you can see in that same section this this can easily be disabled with the ftz flag.

I had to give Jakob custom gemm kernels to do this research. Not sure why the denormal point was left out of this blog as it's pretty critical to the whole experiment.

link

scott-gray 3225 days ago

So a minor correction here. We did explore placing ftz on various instructions inside the matmul ops, but it turns out you don't need anything more than what is already baked into tf by default. All tf gpu primitives are built with -nvcc_options=ftz=true. This means you have an implicit non-linearity after any non-matmul op (provided the scale of computation is near 1e-38). Matmul ops are called through cublas and have denormals enabled.

link