|
|
|
|
|
by p1esk
2604 days ago
|
|
Thank you for the detailed answer. I think your main point is that memory bandwidth would prevent the performance speedup. Are V100s memory bound when executing F16 ops on tensor cores? Second, do we really need dedicated FP32 cores for DL? Tensor cores accumulate in FP32 (is that what you meant when you said they did a significant amount of FP32 compute?), and recent papers indicate we’re moving towards 8 bit training [1]. Besides, do TPUs use dedicated FP32 hw? Finally, if the memory bandwidth is indeed the bottleneck, perhaps all that die area from FP32 and especially FP64 cores could be used for massive amount of cache. [1] https://arxiv.org/abs/1805.11046 |
|
For example, massively increasing the GPU last level cache size would not have the effect of increasing memory bw much on most workloads, because cache only helps when you have temporal locality and gpus like to stream through many GB of data.
This is covered in Hennessy and Patterson if you're curious to learn more. I also talk about it some in the video I linked above.
(Also I doubt that getting rid of f64 support would be a significant die size win. I notice that v100 has, in their marketing speak, twice the fp32 cores as fp64 cores. What do you think are the chances that Nvidia decided a priori this is the optimal ratio? What if instead they are sharing resources between these functional units, at a ratio of two to one?)
To the question of, do you really need fp32 cores, I am not aware of any "widely deployed" GPU model today that does not do significant fp32 work. Perhaps there is research which suggests this isn't necessary! But that is a different thing than we were talking about here, that Nvidia could somehow make a much better chip for the things people are doing today.
I don't want to speak to the question of whether TPUs have f32 hardware, because I'm afraid of saying something that might not be public. But I think the answer to your question can easily be found by some searching and is probably even in the public docs.