|
|
|
|
|
by dragontamer
1431 days ago
|
|
I'm not aware of anything that improves thread-divergence. NVidia's most recent GPUs have superscalar operations, which is a trick from CPU-land (multiple pipelines operating 2 or more instructions per clock tick). NVidia has an integer-pipeline and a floating-point pipeline, and both can operate simultaneously (ex: for(int i=0; i<100; i++) x *=blah; the "i++" is integer, while the "x *= blah" is floating point, so both operate simultaneously. CPUs have extremely flexible pipelines: Intel's pipeline 0 and 1 basically can do anything, pipeline 5 can do most stuff but is missing division IIRC (and a few other things). Load/store are done on some other pipelines, etc. etc. Apple's and AMD's CPU pipelines are more symmetrical and uniform. NVidia GPUs are the only superscalar ones I can think of, aside from AMD GPU's scalar vs vector split (which isn't really the "superscalar" operation I'm trying to describe). |
|
That doesn’t improve the performance of a well behaved and well written compute shader. But avoiding hard hangs IMO deserves the label “improved thread divergence.”