|
|
|
|
|
by Jasper_
784 days ago
|
|
No, that effectively syncs all warps in a thread group. This implementation isn't doing any synchronization, it's independently doing PC/decode for different instructions, and just assuming they won't diverge. That's... a baffling combination of decisions; why do independent PC/decode if they're not to diverge? It reads as a very basic lack of ability to understand the core fundamental value of a GPU. And this isn't a secret GPU architecture thing. Here's a slide deck from 2009 going over the actual high-level architecture of a GPU. Notice how fetch/decode are shared between threads. https://engineering.purdue.edu/~smidkiff/ece563/slides/GPU.p... |
|