Y
Hacker News
new
|
ask
|
show
|
jobs
by
Saurabh_29
2479 days ago
The main bottleneck is the data transfer speed between the GPU and the SMs. Also, using tensor core doesn't necessarily apply using half-precision as now NVIDIA supports single-precision operation in Tensorcore too.