Hacker News new | ask | show | jobs
by TimothyFitz 3105 days ago
Can you help me understand, how can the V100 have 10x the TFLOPs of the P100, but only get a 2.5x speed increase in training a neural net according to nvidia's docs? https://devblogs.nvidia.com/parallelforall/inside-volta/

Do we need significant software changes to take advantage of the new power? Are the TFLOPs somehow not directly comparable?

3 comments

Most published numbers aren’t actually using the tensor cores. We’re using dlib (which is in c++) and gives us more direct control, but surely Tensorflow will eventually do this too.
Probably memory bound.