Hacker News new | ask | show | jobs
by dkobran 3036 days ago
Just to clarify, is this benchmark leveraging mixed-precision mode on the Volta V100? The major innovation of the Volta generation is mixed-precision which NVIDIA claims is a huge performance increase over the Pascal generation (P100 in the case of your benchmark).

Link to NVIDIA documentation on mixed-precision TensorCores: https://devblogs.nvidia.com/inside-volta/

1 comments

Where specified "fp16", the V100 benchmarks use the code from https://github.com/tensorflow/benchmarks/tree/master/scripts... with the flag --use_fp16=true which enables fp16 for some but not all Tensors.
It's my understanding that fp16 (available on the previous generation P100) and mixed-precision (major innovation of V100) are different things and the speedup of TensorCores is entirely missing from this benchmark. Unlike the general purpose P100, the TPU is a heavily optimized chip built for Deep Learning, hence it's performance increase. However, the V100 is also heavily optimized for Deep Learning (arguably the first non-GPU chip) from NVIDIA. I'm in no position to defend NVIDIA here haha but it seems like the benchmark misses the point if this is indeed the case.
It was my understanding that the TensorFlow benchmarks do make use of TensorCores on the V100. We'll verify and update accordingly.