Hacker News new | ask | show | jobs
by incrudible 564 days ago
You actually do not have vastly more cores, to a first approximation a CPU core is equivalent to a streaming multiprocessor (in NVIDIA parlance) on the GPU. A 14900K has 24 cores (of two kinds) and a similarly big 3060 has 28SMs. The GPU effectively trades all the deep pipelining and branch prediction for much wider SIMD. That makes it massively slower for any code that involves branching and massively faster for any code that is data parallel.
1 comments

Probably more accurate to multiply that by 4. Each SM is split into 4 partitions that can each execute different instructions but with shared L1 cache.