Hacker News new | ask | show | jobs
by dragontamer 1923 days ago
> (and generally anything with pointer-chasing)

You mean like... going through a BVH tree to find what AABB bounding box collides with a ray? :-) I'm pretty sure its been demonstrated that GPUs are fastest at that.

Yeah, I know that linked lists take a latency hit. But even with that big hit, O(1) operations vs O(n) adds up. Don't avoid linked-lists, trees, or graphs just because you're trapped thinking about cache-locality or whatever.

A win in asymptotic complexity (especially O(1) vs O(n)) is utterly huge. On the one hand, its common for beginners to overestimate how much this matters. But on the other hand... its an asymptotic win. You gotta give it a shot.

Arrays win in many cases (and more cases in GPUs, because GPUs are worse at pointer chasing than arrays). Still, there are plenty of situations where the linked-list / tree / graph is simply unavoidable. Be it an oct-tree, linked list, or... BVH-tree traversals in Raytracing.

1 comments

Naive tree traversal on a GPU actually has pretty bad performance, due to execution divergence. It takes a lot of application-specific reframing of the problem to making working with BVH trees efficient: https://developer.nvidia.com/blog/thinking-parallel-part-ii-...
Its not as complicated as it sounds. Stream compaction solves execution divergence. The end. Instead of recursively searching the tree, you select the members of the tree with a child.

No, you can't do naïve recursion for this. GPUs just don't do that very well. But break it up with stream compaction, and everything is cake.

http://www.cse.chalmers.se/~uffe/streamcompaction.pdf

----------

Its not the memory-link latency that gets you here. Its branch divergence. Solve branch divergence, and then you're far faster than a CPU at traversing that BVH tree. Even without Raytracing Hardware. Even with lol 1000ns latency per node = node->next (GPUs turn out to be decent at latency hiding if you up that occupancy a bit... and just double-check on the compiler / assembly language stuff to ensure that the access was rearranged to a sane location).