Hacker News new | ask | show | jobs
by zdfjkhiuj 3073 days ago
>The reason was data dependencies that I couldn't see, even in the assembly.

I don't understand why that would matter. Aren't GPUs in-order? I don't know the low-level architecture of GPUs at all.

4 comments

An easy explanation is that you can think of GPUs as being massively hyperthreaded. So, when one thread hits a data stall, another thread picks up to use the ALU resources until it hits a stall, and so on through many, many threads before it cycles back to the original. But, data stalls are very long. And, if you don't have enough ALU for the other threads to work on before they stall too, you'll end up back on the first thread waiting for data anyway.

If you want to understand low-level GPU architecture, https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-... is a great intro.

They are parallel, a data dependency cannot be pipelined as easily.

So no, they are not in-order.

Radeons do not have speculative execution but that doors not make them in-order.

Radeons compute units are in-order though, in the usual sense of the word, and I'd love to hear it if there really was an out-of-order GPU. It'd be rather surprising.

One thing that they do have to deal with data dependencies is that load (and texture fetch etc.) instructions don't block. Instead, there's a separate instruction for waiting on the result of a previous load.

My understanding:

GPUs are (generally) in-order within each thread, but they are pipelined. The pipeline is filled with instructions that are ready to execute from across many threads. If all threads have an unmet dependency (previous instruction or memory access), the pipeline will stall.

GPU compilers prefer to inline everything, and they try to reuse partial results if they can, so it’s easy to get out of order dependencies in places you might not expect.