Hacker News new | ask | show | jobs
by Sohcahtoa82 259 days ago
I imagine the answer is "Higher quality" or "Better customization". You can get extremely precise control over the render pipeline on a CPU since you can calculate pixels however you want.

But...with today's world of pixel shaders (Really, a world that's existed for 10+ years now), I'd be surprised if there's actually any benefit to be had these days. With a proper pixel shader, I doubt there's anything you could do on a CPU that you couldn't do on a GPU, and the GPU would be massively parallel and do it much faster.

1 comments

You give my understanding in your last sentence there. I don't think there's any "higher quality" graphics which could be rendered on a CPU that couldn't be rendered on a GPU. Since they are equivalent in their possible actions, the only differential would be speed, which is what GPUs are designed for.

But to play devil's advocate against myself, I have heard that programming for GPUs can be harder for many things. So maybe usability and developer-friendliness is what is meant by CPUs being better?

GPUs are TERRIBLE at executing code with tons of branches.

Basically, GPUs execute instructions in lockstep groups of threads. Each group executes the same instruction at the same time. If there's a conditional, and only some of the threads in a group have a state that satisfies the condition, then the group is split and the paths are executed in serial rather than parallel. The threads following the "true" path execute while the threads that need to take the "false" path sit idle. Once the "true" threads complete, they sit idle while the "false" threads execute. Only once both threads complete do they reconverge and continue.

They're designed this way because it greatly simplifies the hardware. You don't need huge branch predictors or out-of-order execution engines, and it allows you to create a processor with thousands of cores (The RTX 5090 has over 24,000 CUDA cores!) without needing thousands of instruction decoders, which would be necessary to allow each core to do its own thing.

There ARE ways to work around this. For example, it can sometimes be faster to compute BOTH sides of a branch, but then merely apply the "if" on which result to select. Then, each thread would merely need to apply an assignment, so the stalls only last for an instruction or two.

Of course, it's worth noting that this non-optimal behavior is only an issue with divergent branches. If every thread decides the "if" is true, there's no performance penalty.