Hacker News new | ask | show | jobs
by guenthert 1480 days ago
SP is sixteen times the performance of DP here for no other reasons then market segmentation. Nvidia might have started that, but that's no reason not to call AMD out for it.
1 comments

fp32 uses much less silicon and power than fp64. I think the scaling is roughly quadratic in both, so 4x performance is free.

I vaguely remember a consumer card having 1/4 the fp64 units of a similar data center one so that would get the 16x on paper.

Memory bandwidth / register file size would suggest another 2x from moving less data. My working heuristic on these things is compute is free because I fail to saturate the memory bus but no doubt some applications do actually run into that slowdown in practice. Matrix multiply probably does.