| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by oddity 1477 days ago
	The difference is much more nuanced than this. A modern GPU can (and probably does) do most of what you've listed for a CPU. Speculative execution and branch prediction are a bit less likely to be invested in (because they don't need it as much due to oversubscription), but that's increasingly true for CPUs as well for high-efficiency cores. The difference (at a category vs category level and not specific microarch) is mostly a matter of tuning for particular workloads. I'm increasingly souring on SIMD/SIMT being a useful distinction now that bleeding-edge CPUs are widening in the microarch and bleeding-edge GPUs are getting better at handling thread divergence in the microarch. There is a difference, certainly, but it's difficult to describe in a few bullet points. GPUs are more likely to have more exotic features than you'll see on a CPU to deal with things like thread coordination and cache coherence, but there's nothing fundamentally stopping CPUs from adding that (or wanting that) as well.

1 comments

Lichtso 1477 days ago

> GPUs are getting better at handling thread divergence in the microarch

That is an interesting point, how does that work (especially with the dynamics of ray tracing)? Do they recombine under utilized wavefronts or something?

link

dragontamer 1477 days ago

I'm not aware of anything that improves thread-divergence. NVidia's most recent GPUs have superscalar operations, which is a trick from CPU-land (multiple pipelines operating 2 or more instructions per clock tick). NVidia has an integer-pipeline and a floating-point pipeline, and both can operate simultaneously (ex: for(int i=0; i<100; i++) x *=blah; the "i++" is integer, while the "x *= blah" is floating point, so both operate simultaneously.

CPUs have extremely flexible pipelines: Intel's pipeline 0 and 1 basically can do anything, pipeline 5 can do most stuff but is missing division IIRC (and a few other things). Load/store are done on some other pipelines, etc. etc.

Apple's and AMD's CPU pipelines are more symmetrical and uniform.

NVidia GPUs are the only superscalar ones I can think of, aside from AMD GPU's scalar vs vector split (which isn't really the "superscalar" operation I'm trying to describe).

link

TomVDB 1477 days ago

Starting with Volta, Nvidia GPUs have forward progress guarantee, preventing lockups when there’s thread divergence.

That doesn’t improve the performance of a well behaved and well written compute shader. But avoiding hard hangs IMO deserves the label “improved thread divergence.”

link

jjoonathan 1477 days ago

Aren't warps still 32 threads, even though number of threads is skyrocketing, effectively making them proportionately finer granularity? Are things different in AMD land?

link

JonChesterfield 1477 days ago

Slightly, the older tech is 64 threads/lanes per warp/wavefront. Newer ones are 32 by default but 64 if desired.

Bigger differences are the instruction counter per thread since volta on nvidia (which I think is a terrible feature) and that forward progress guarantees are stronger on nvidia (those are _really_ helpful but expensive).

link

TomVDB 1477 days ago

Nvidia GPUs were 32 threads per warps eight from the start of CUDA with the 8800 GTX.

> which I think is a terrible feature <> those are _really_ helpful but expensive

Guaranteed forward progress is a direct consequence of having an instruction counter per thread???

Or so I thought. How else would an SM be able to know the PC of a group of threads that wasn’t stuck?

link

dragontamer 1477 days ago

> Slightly, the older tech is 64 threads/lanes per warp/wavefront. Newer ones are 32 by default but 64 if desired.

AMD GCN was 64 threads/wavefront. NVidia always was 32 threads/warp.

AMD's newest consumer cards RDNA and RDNA2 are 32 threads/wavefront. However, GCN lives on with CDNA (MI200 supercomputer chips), with 64 threads/wavefront architecture.

link

djmips 1477 days ago

There is tech in late model GPUs to keep all same divergent threads in the same warp/wavefront.

link