Hacker News new | ask | show | jobs
by ATsch 1957 days ago
Hey, thanks for the detailed comment. I should note I didn't intend to create the impression that GPUs and CPUs architectures were completely interchangeable. That's obviously not true, because of the fixed function hardware alone. The intent was more to give a rough idea of what "CUDA Core" actually means and roughly how those concepts map to what we know in CPUs.

> Yes, "CUDA Core" is (was?) analogous to "FP32 ALU" - (it may have transitioned to FP16 nowadays to double the count?)

It's still FP32 ALUs (NVidia disables FP16 in the driver for gaming cards). The doubling between Turing and Ampere is due to the combined int/fp units being counted as CUDA cores. This also means that for int instructions, the expexted performance is not actually increased, another reason the "core" term is unhelpful.

> But, single GPU "Compute Units" often have 2-4x that (up-to 64 FP32 ALU)

Yep, guess I should have left that AVX2048 joke in there for you. It just wasn't very important IMO.

> Also, for accuracy, vis-a-vis "independent parallel threads of execution like on CPUs", even the 64-core Threadripper has 128, not 64.

I did not consider SMT here because while relevant for keeping the ALUs fed with instructions, they don't increase the amount of raw peak compute possible. Those 128 threads can still only issue 64 avx instructions at any one time, which is what really matters here.