| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by no_time 1341 days ago
	>While most GPUs support FP64, unless you pay for the really high-end scientific computing models, you're typically getting 1/32nd rate compared to FP32 performance. I wonder if there is a hardware reason for this or It's just market segmenting by nvidia.

2 comments

alwayslikethis 1341 days ago

Mostly market segmentation. There is a software lock to a certain ratio (of clock speed) to the FP32 performance that varies by the card. For most consumer NVIDIA cards it is locked to 1/24 of FP32 speed to prevent use in professional settings that require FP64 performance. However, some cards, such as the Radeon VII, is only locked to 1/4 of FP32 speed (much faster)

link

Hextinium 1341 days ago

My naive guess is that most floating point code uses FP32 and FP64 uses at least double the die size. So optimize for FP32 and have some FP64 for the rare equations that need it.

link

thesz 1341 days ago

These compute units are usually sliced - they can perform either four FP32 multiples or one FP64 multiply on the same die part. This trick was done as long ago as PA-RISC was developed, from what I remember it was HP who introduced sliced ALU, capable of doing one large or several smaller operations on the same hardware.

I can be wrong about who did that first, but most FPUs now are done like that.

link

my123 1340 days ago

On GPUs, they're not sliced like this anymore since quite a long time to save die area.

link

thesz 1332 days ago

The slicing was introduced to save die area. Not to slice is to have slightly smaller computation delay traded for greater die area.

link