| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Lichtso 1478 days ago
	> GPUs are getting better at handling thread divergence in the microarch That is an interesting point, how does that work (especially with the dynamics of ray tracing)? Do they recombine under utilized wavefronts or something?

3 comments

dragontamer 1478 days ago

I'm not aware of anything that improves thread-divergence. NVidia's most recent GPUs have superscalar operations, which is a trick from CPU-land (multiple pipelines operating 2 or more instructions per clock tick). NVidia has an integer-pipeline and a floating-point pipeline, and both can operate simultaneously (ex: for(int i=0; i<100; i++) x *=blah; the "i++" is integer, while the "x *= blah" is floating point, so both operate simultaneously.

CPUs have extremely flexible pipelines: Intel's pipeline 0 and 1 basically can do anything, pipeline 5 can do most stuff but is missing division IIRC (and a few other things). Load/store are done on some other pipelines, etc. etc.

Apple's and AMD's CPU pipelines are more symmetrical and uniform.

NVidia GPUs are the only superscalar ones I can think of, aside from AMD GPU's scalar vs vector split (which isn't really the "superscalar" operation I'm trying to describe).

link

TomVDB 1477 days ago

Starting with Volta, Nvidia GPUs have forward progress guarantee, preventing lockups when there’s thread divergence.

That doesn’t improve the performance of a well behaved and well written compute shader. But avoiding hard hangs IMO deserves the label “improved thread divergence.”

link

jjoonathan 1477 days ago

Aren't warps still 32 threads, even though number of threads is skyrocketing, effectively making them proportionately finer granularity? Are things different in AMD land?

link

JonChesterfield 1477 days ago

Slightly, the older tech is 64 threads/lanes per warp/wavefront. Newer ones are 32 by default but 64 if desired.

Bigger differences are the instruction counter per thread since volta on nvidia (which I think is a terrible feature) and that forward progress guarantees are stronger on nvidia (those are _really_ helpful but expensive).

link

TomVDB 1477 days ago

Nvidia GPUs were 32 threads per warps eight from the start of CUDA with the 8800 GTX.

> which I think is a terrible feature <> those are _really_ helpful but expensive

Guaranteed forward progress is a direct consequence of having an instruction counter per thread???

Or so I thought. How else would an SM be able to know the PC of a group of threads that wasn’t stuck?

link

dragontamer 1477 days ago

> Slightly, the older tech is 64 threads/lanes per warp/wavefront. Newer ones are 32 by default but 64 if desired.

AMD GCN was 64 threads/wavefront. NVidia always was 32 threads/warp.

AMD's newest consumer cards RDNA and RDNA2 are 32 threads/wavefront. However, GCN lives on with CDNA (MI200 supercomputer chips), with 64 threads/wavefront architecture.

link

djmips 1477 days ago

There is tech in late model GPUs to keep all same divergent threads in the same warp/wavefront.

link