| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Firadeoclus 1988 days ago
	> The V100 only gets 14 TFLOPS because it lacks the dedicated TensorRT accelerator hardware. V100 has both vec2 hfma (i.e. fp16 multiply-add is twice the rate of fp32), getting ~30 TFLOPS, and tensor cores which can achieve up to 4x that for matrix multiplications.