|
|
|
|
|
by JonChesterfield
1475 days ago
|
|
fp32 uses much less silicon and power than fp64. I think the scaling is roughly quadratic in both, so 4x performance is free. I vaguely remember a consumer card having 1/4 the fp64 units of a similar data center one so that would get the 16x on paper. Memory bandwidth / register file size would suggest another 2x from moving less data. My working heuristic on these things is compute is free because I fail to saturate the memory bus but no doubt some applications do actually run into that slowdown in practice. Matrix multiply probably does. |
|