Hacker News new | ask | show | jobs
by jhj 3621 days ago
It depends upon the op / byte loaded intensity. Nvidia packs their GPUs with a lot of float32 (or float64) units because some problems (e.g., convolution, or more typical HPC problems like PDEs, which will probably be done in float64) have a high flop / byte ratio.

A problem just calculating, say, hamming distance or 1-2 integer bit ops per integer word loaded will probably be memory bandwidth bound rather than integer op throughput limited. More complicated operations (e.g., cryptographic hashing) that have a higher iop / byte loaded will be limited by the reduced throughput of the integer op functional units rather than memory bandwidth.

For "deep learning", convolution is one of the few operations that tends to be compute rather than memory b/w bound. It's my understanding that Sgemm (float32 matrix multiplication) has been memory b/w limited for a while on Nvidia GPUs. Though, if you muck around with the architecture (as with Pascal), the ratio of compute to memory b/w to compute resources (smem, register file memory) may change the ratios up.