Hacker News new | ask | show | jobs
by david-gpu 2838 days ago
It's more complex than that. Switching warps thrashes your caches. There is definitely a cost associated with it.
1 comments

Well, yeah. If you are memory bandwidth-constrained it's a bad idea to go off-chip.

But for ray-tracing, what does it really matter? We are already assuming that you will wait a full memory fetch cycle to get the next node's child AABBs and child indices. The warps will do their intersection test on the data they just read and fire off the next read. Each thread's hot context should fit in under a cache line, since it's basically just a single ray to keep track of.