Hacker News new | ask | show | jobs
by jjoonathan 1954 days ago
SIMT is a much more convenient programming model than SIMD for wide compute, though. So much so that I think the marketing around CUDA cores isn't merely reasonable but for many applications actually strikes closer to the truth than counting ALUs or independent threads.

If you count ALUs, you see that the CPU has many of them, but you don't see how difficult it is to chunk up data to keep those fed.

If you count independent threads, you see that the GPU has few of them, but you don't see how it conceptually has many threads, which simplifies the programming of each thread while gracefully degrading only in proportion to how much branching you actually use.

1 comments

I definitely agree with it being a useful model, after all that is the the way GPUs are presented to programmers. But when getting into detailed performance comparisons, especially with CPUs, that simplified model breaks down quickly and becomes a hindrance. Which is why I think it's unfortunate that NVidia (deliberately) uses the term "core" that leads people to believe they can make a direct comparison to CPUs. Comparisons that coincidentally lead people to believe GPUs are much more special than they actually are.
In detailed comparisons all simplified models break down. Tallying max theoretical compute is only appropriate if you're going to put in the effort to actually use it, which is exceedingly rare, even in compute kernels that have supposedly had lots of love and attention already paid to them. So the human factor has to be included in the model and the human factor consistently de-rates SIMD more than it does SIMT.

I realize that under the covers this is more of a compiler/language thing than a compute model thing, but for whatever reason I just don't see much SIMT code targeting CPUs, so again, human factor.

My primary objection is not to the SIMT model, but NVidia reusing a term with an established meaning ("core") for something completely different and incomparable. Other companies terms like "execution units", "compute units" and "stream processors" (although perhaps not as much the last) are much more truthful about the nature of GPUs without hindering the programming model at all.
From the standpoint of a programmer who doesn't want to suffer through constant DIY chunking and packing (very close to my personal definition of hell), a CUDA core looks a lot like a CPU core. From the point of view of someone not writing the code, a CUDA core looks like merely another FPU.