| HN Mirror

Still true that fp64 throughput is lower for consumer GPUs - both NV and AMD. That’s kinda why I was curious about leading with that metric - outside of HPC and scientific applications, a lot of people don’t really need fp64, and the machine might normally have a much higher fp32 throughput.

> knowing you can drop to high precision when necessary without penalty is nice.

I guess I maybe don’t know why you’d ever have 1:1 fp32 and fp64 perf. Aren’t the fp64 multipliers (for example) basically 4x fp32 multipliers? I am under the possibly naive impression that if you have all the transistors for 1 fp64 core, that you’d end up with all the transistors you need for 2 or 4 fp32 cores. Maybe that’s not true today, but there does have to be at least 2x the transistors overall for 64-bit vs 32-bit, and lots of those should be shared or reusable, no? It doesn’t seem quite right to frame naturally higher 32-bit op throughput as a “penalty” on 64-bit ops. You’re asking the hardware to do more with 64, and it makes complete sense that given the exact same budget for bandwidth, energy, memory, compute, etc. that 32-bit ops would go faster, no? If the op throughput of fp64 and fp32 is the same, doesn’t that possibly imply that the fp32 ops are potentially being wasted / penalized, just for the sake of having matching numbers?