Hacker News new | ask | show | jobs
by no_time 1341 days ago
>While most GPUs support FP64, unless you pay for the really high-end scientific computing models, you're typically getting 1/32nd rate compared to FP32 performance.

I wonder if there is a hardware reason for this or It's just market segmenting by nvidia.

2 comments

Mostly market segmentation. There is a software lock to a certain ratio (of clock speed) to the FP32 performance that varies by the card. For most consumer NVIDIA cards it is locked to 1/24 of FP32 speed to prevent use in professional settings that require FP64 performance. However, some cards, such as the Radeon VII, is only locked to 1/4 of FP32 speed (much faster)
My naive guess is that most floating point code uses FP32 and FP64 uses at least double the die size. So optimize for FP32 and have some FP64 for the rare equations that need it.
These compute units are usually sliced - they can perform either four FP32 multiples or one FP64 multiply on the same die part. This trick was done as long ago as PA-RISC was developed, from what I remember it was HP who introduced sliced ALU, capable of doing one large or several smaller operations on the same hardware.

I can be wrong about who did that first, but most FPUs now are done like that.

On GPUs, they're not sliced like this anymore since quite a long time to save die area.
The slicing was introduced to save die area. Not to slice is to have slightly smaller computation delay traded for greater die area.