|
|
|
|
|
by Firadeoclus
1988 days ago
|
|
> The V100 only gets 14 TFLOPS because it lacks the dedicated TensorRT accelerator hardware. V100 has both vec2 hfma (i.e. fp16 multiply-add is twice the rate of fp32), getting ~30 TFLOPS, and tensor cores which can achieve up to 4x that for matrix multiplications. |
|