Hacker News new | ask | show | jobs
by znfi 3370 days ago
The Titan X(P) does not support 16-bit floats, or, well, it is supported but at 1/64th the speed of 32-bit floats.

Source: https://en.wikipedia.org/wiki/Pascal_(microarchitecture)

section 2.4 Chips claims the Titan XP uses the GP102 chip, and section 3 Performance gives the speed for computing with 16-bit floats.

4 comments

They (including the 1080ti which is basically a Titan) do support 4x faster INT8, though, so if comparing to a reduced-precision ternary net running a FPGA, that seems relevant. (They mention using INT8 in some of the GPU benchmarks but I'm not sure which graphs are supposed to represent that.)
GP100 supports FP16 FMAD GP102 supports INT16 and INT8 MAD with 32-bit accumulation

Overall, not impressed with Stratix 10. It won't be cost effective, it's not much more power-efficient, and Volta will likely leapfrog it across the board within a year.

Wasn't this thing supposed to sample in late 2014? Back then it would have been a gamechanger at any price. Now, 1080Ti for $700 beats it across the board in throughput/$. NVIDIA's confusing messaging about using consumer versus professional HW is about the only thing that might make it viable for deep learning. Although I note the absence of training perf numbers here, just (apparently) inference.

"GP100 however uses more flexible FP32 cores that are able to process one single-precision or two half-precision numbers in a two-element vector. Nvidia intends to address the calculation of algorithms related to deep learning with those." So 16-bit ops are more than twice as fast as 32-bit ops, if you pack them into the 32-bit cores two at a time and also use the dedicated 16-bit core.
The speed is likely faster, but there probably aren't as many 16 bit floating point units.