Hacker News new | ask | show | jobs
by sp332 3375 days ago
"GP100 however uses more flexible FP32 cores that are able to process one single-precision or two half-precision numbers in a two-element vector. Nvidia intends to address the calculation of algorithms related to deep learning with those." So 16-bit ops are more than twice as fast as 32-bit ops, if you pack them into the 32-bit cores two at a time and also use the dedicated 16-bit core.