Hacker News new | ask | show | jobs
by tbenst 2014 days ago
SIMD and ie Nvidia WARP are not the same. Idk about Apple’s GPU, but for example there is no GPU alternative to the SQRTPD instruction (Square root of double precision). Also, when there is branch divergence across threads, CPUs still do a much better job than GPUs.

Curious to think about how unified memory may change the ratio of flops/memory access when it makes sense to shift job from CPU (better for low number) to GPU (better for high ratio)