|
|
|
|
|
by reroute22
951 days ago
|
|
Sorta, yeah! Also, your "128 cuda cores" of Skylake variety run at higher frequencies and work off of much bigger caches, so they are faster (in serial manner)... ...until they are slower, because GPU's latency hiding mechanism (with occupancy) hides load latencies very well, while CPU just stalls the pipeline on every cache miss for ungodly amounts of time... ...until they are faster again when the shader program uses a lot of registers and GPU occupancy drops to the floor and latency hiding stops hiding that well. But core counts - yes, more or less. |
|
> ...until they are slower, because GPU's latency hiding mechanism (with occupancy) hides load latencies very well, while CPU just stalls the pipeline on every cache miss for ungodly amounts of time...
Is the GPU latency hiding mechanism equivalent to SMT/Hyperthreading, but with more threads per physical core? Or is there more machinery?
Also, how akin GPUs "stream multiprocessors"/cores are to CPUs ones at the microarchitectural level? Are they out-of-order? Do they do register renaming?