Hacker News new | ask | show | jobs
by Saurabh_29 2479 days ago
The main bottleneck is the data transfer speed between the GPU and the SMs. Also, using tensor core doesn't necessarily apply using half-precision as now NVIDIA supports single-precision operation in Tensorcore too.