| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by exDM69 2830 days ago
	It is more like a texture unit than a shader core. Tree traversal is a pointer chasing problem, where the CPU/shader core executes a few instructions, then starts a memory load and then sits idle for tens or hundreds of clock cycles waiting for memory. Cache prefetching can help but is usually not a good fit for tree traversal where there is very little computation per node. It is all about memory latency hiding and not really about computation.

1 comments

kbwt 2830 days ago

But GPU cores are already king at latency hiding. They can run hundreds of threads doing pointer chasing, switching between them round-robin as the memory reads complete.

link

exDM69 2830 days ago

The switching isn't free. Waking up a thread to do just a few computation cycles (a few ray-aabb intersections) and then going back to sleep while waiting for the next node to be fetched from the memory is super inefficient.

If there was significant computation needed per node, this wouldn't be an issue.

link

kbwt 2830 days ago

> The switching isn't free.

It absolutely is, on current GPUs. Think of it like a larger-scale version of SMT (Intel's hyperthreading). GPUs are able to do this because they execute instructions in-order and do not need to track thousands of instructions per thread.

link

david-gpu 2830 days ago

It's more complex than that. Switching warps thrashes your caches. There is definitely a cost associated with it.

link

kbwt 2830 days ago

Well, yeah. If you are memory bandwidth-constrained it's a bad idea to go off-chip.

But for ray-tracing, what does it really matter? We are already assuming that you will wait a full memory fetch cycle to get the next node's child AABBs and child indices. The warps will do their intersection test on the data they just read and fire off the next read. Each thread's hot context should fit in under a cache line, since it's basically just a single ray to keep track of.

link

namibj 2830 days ago

IIRC it costs you a cycle to switch warps on Maxwell, but I'm not completely sure.

link