Hacker News new | ask | show | jobs
by Firadeoclus 1988 days ago
> The V100 only gets 14 TFLOPS because it lacks the dedicated TensorRT accelerator hardware.

V100 has both vec2 hfma (i.e. fp16 multiply-add is twice the rate of fp32), getting ~30 TFLOPS, and tensor cores which can achieve up to 4x that for matrix multiplications.