Hacker News new | ask | show | jobs
by dragontamer 2164 days ago
> It is true that there are tasks where threading matters, but still require a CPU rather than a GPU. I wonder however if these tasks do need full SSE/AVX etc. Couldn't these extensions be removed of the CPU cores and instead have the necessary work performed by the GPU?

SSE/AVX shares an L1 cache that's damn near instantaneous to access for the CPU core. Total L1 bandwidth is on the scale of TB/s.

PCIe -> GPU takes 1-microsecond to 10-microseconds per access, and operates only at 50GB/s (or 1/20th the speed of L1 bandwidths).

------------

Case in point: Memset is very commonly AVX'd to clear out L1 cache and initialize ~1kb to 32kb of data to 0 as quickly as possible.

There's no way for "memset" to move from CPU to GPU unless you feel like obliterating the entire point of L1, L2, and L3 cache. If you moved a "memset" to GPU, it'd operate only at 15GB/s (the speed of PCIe 3.0 x16 lanes), far, far slower than L1 cache AVX-loads/stores.

SIMD units, like SSE and AVX, are highly "local" and have huge advantages.