|
|
|
|
|
by sp332
3375 days ago
|
|
"GP100 however uses more flexible FP32 cores that are able to process one single-precision or two half-precision numbers in a two-element vector. Nvidia intends to address the calculation of algorithms related to deep learning with those." So 16-bit ops are more than twice as fast as 32-bit ops, if you pack them into the 32-bit cores two at a time and also use the dedicated 16-bit core. |
|