Hacker News new | ask | show | jobs
by frogblast 1183 days ago
I strongly suspect the reason Nvidia trees are so shallow is that NSight simply isn't showing the actual tree structure, probably because Nvidia considers that proprietary. It appears to just list all the leafs of a tree in a big flat list. But there definitely is a tree in there.
4 comments

At work I inherited a raytracer codebase with a severe memory bloat problem on terrains. The size of terrain BLASes is precisely what one would expect from a bog-standard BVH with branch factor 2, so I'm sure you're right.

This is on Turing. Nvidia would've been motivated to de-risk the introduction of RTX by making boring choices. You may well see different results on later archs.

Perhaps the rest of it isn't a tree and is some other optimized data structure? Like some sort of spatial hash or sort
Since cache hit ratios are so central to fast GPU code, the tree structure doesn't have to be exotic, likely the secret sauce is how it performs in the caches.
I'm very curious to see it unrolled down to its actual structure.