| HN Mirror

Its not as complicated as it sounds. Stream compaction solves execution divergence. The end. Instead of recursively searching the tree, you select the members of the tree with a child.

No, you can't do naïve recursion for this. GPUs just don't do that very well. But break it up with stream compaction, and everything is cake.

http://www.cse.chalmers.se/~uffe/streamcompaction.pdf

----------

Its not the memory-link latency that gets you here. Its branch divergence. Solve branch divergence, and then you're far faster than a CPU at traversing that BVH tree. Even without Raytracing Hardware. Even with lol 1000ns latency per node = node->next (GPUs turn out to be decent at latency hiding if you up that occupancy a bit... and just double-check on the compiler / assembly language stuff to ensure that the access was rearranged to a sane location).