|
|
|
|
|
by tbenst
2014 days ago
|
|
SIMD and ie Nvidia WARP are not the same. Idk about Appleās GPU, but for example there is no GPU alternative to the SQRTPD instruction (Square root of double precision). Also, when there is branch divergence across threads, CPUs still do a much better job than GPUs. Curious to think about how unified memory may change the ratio of flops/memory access when it makes sense to shift job from CPU (better for low number) to GPU (better for high ratio) |
|