| HN Mirror

That's not exactly true, it can be slower than the slowest individual pixel. It's not just running the same code for each pixel in parallel across many cores, a single core* actually runs pixels at once and therefore has to have the same program counter on all of those pixels. If two pixels diverge then the core has to alternate between the different PCs and toggle each lane on and off depending on which pixel is currently executing.

That means if you had a shader like:

    if (pixelIndex % 2) {
        longFunctionA();
    } else {
        longFunctionB();
    }

It would actually take twice as long to run compared to every pixel calling the same function. Each core is executing a batch of pixels (a warp) that is evenly split between two completely different sections of code, so it has to alternate between each until they both finish.

* Core might not be the exact right term, Nvidia calls them SMs and other GPU vendors have different names.