| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by TimothyFitz 3105 days ago
	Can you help me understand, how can the V100 have 10x the TFLOPs of the P100, but only get a 2.5x speed increase in training a neural net according to nvidia's docs? https://devblogs.nvidia.com/parallelforall/inside-volta/ Do we need significant software changes to take advantage of the new power? Are the TFLOPs somehow not directly comparable?

3 comments

paulsutter 3105 days ago

Most published numbers aren’t actually using the tensor cores. We’re using dlib (which is in c++) and gives us more direct control, but surely Tensorflow will eventually do this too.

link

yzmtf2008 3105 days ago

https://en.wikipedia.org/wiki/Roofline_model

link

andars 3105 days ago

Probably memory bound.

link