|
|
|
|
|
by petermcneeley
524 days ago
|
|
In actual implementation they are very much like very wide SIMD on a CPU core.
Each HW thread is a different warp as each warp can execute different instructions. This mapping is so close that translation from GPU to CPU relatively easy and performant. |
|