|
|
|
|
|
by jms55
452 days ago
|
|
Agreed, there are two different problems being described here. 1. Divergence of threads within a workgroup/SM/whatever 2. Dynamically scheduling new workloads (i.e. dispatches, draws, etc) in response to the output of a previous workload Raytracing is problem #1 (and has it's own solutions, like shader execution reodering), while Raph is talking about problem #2. |
|
The "solution" to Raytracing (ignoring hardware acceleration like shader reordering), is stream compaction and stream expansion.
If you are willing to have lots of loops inside of a shader (not always possible due to Windows's 2 second maximum), you can while(hits_array is not empty) kind of code, allowing your 1024-wavegroup to keep recursively calling all of the hits and efficiently processing all of the rays recursively.--------
The important tidbit is that this technique generalizes. If you have 5 functions that need to be "called" after your current processing, then it becomes:
Now of course we can't grow "too far", GPUs can't handle divergence very well. But for "small" numbers of next-arrays and "small" amounts of divergence (ie: I'm assuming that func1 is the most common here, like 80%+ so that the buffers remain full), then this technique works.If you have more divergence than that, then you need to think more carefully about how to continue. Maybe GPUs are a bad fit (ex: any HTTP server code will be awful on GPUs) and you're forced to use a CPU.