| CPU-SIMD is less about competing against GPUs and more about latency. GPUs will always have more GFlops and memory bandwidth at a lower cost. They're specifically built GFlop and memory-bandwidth machines. Case in point: the NVidia 2070 Super is 8 TFlops of compute at $400, a tiny fraction of what this POWER10 will cost. If POWER10 costs anything like POWER9, we're looking at well over $2000 for the bigger chips and $1000 for reasonable multisocket motherboards. And holy moly: 602mm^2 at 7nm is going to be EXPENSIVE. EDIT: I'm only calculating ~2 TFlops from the hypothetical 60x SMT4 Power10 at 4GHz. That's no where close to GPU-level Flops. However, the CPU-GPU link is slow in comparison to CPU-L1 cache (or even CPU-DDR4 / DDR5). A CPU can "win the race" by using SIMD onboard, completing your task before it even spent the ~5-microseconds needed to communicate to the GPU. ---------- With that being said: POWER10 also implements PCIe 5.0, which means it will be one of the fastest processors for communicating with future GPUs. |
A64FX (in Fugaku, the current #1 machine on all popular supercomputing benchmarks) has shown that CPUs can compete with top-shelf GPUs on bandwidth and floating point energy efficiency.