Hacker News new | ask | show | jobs
by contravariant 1649 days ago
It gets even weirder when you realize it's not just running your code for each pixel it's running the exact same instructions in parallel for large square blocks of pixels, which makes branching incredibly expensive.
1 comments

Only as expensive as the slowest pixel in the batch :D
That's not exactly true, it can be slower than the slowest individual pixel. It's not just running the same code for each pixel in parallel across many cores, a single core* actually runs pixels at once and therefore has to have the same program counter on all of those pixels. If two pixels diverge then the core has to alternate between the different PCs and toggle each lane on and off depending on which pixel is currently executing.

That means if you had a shader like:

    if (pixelIndex % 2) {
        longFunctionA();
    } else {
        longFunctionB();
    }
It would actually take twice as long to run compared to every pixel calling the same function. Each core is executing a batch of pixels (a warp) that is evenly split between two completely different sections of code, so it has to alternate between each until they both finish.

* Core might not be the exact right term, Nvidia calls them SMs and other GPU vendors have different names.