| HN Mirror

Why are we not also talking about memory bandwidth? Personal opinion: this is the key. The latest Phi had about 100 GB/s in 2017. The contemporary Nvidia GTX 1080: 320 GB/s.

When CPUs actually come with bandwidth and a decent vector unit, such as the A64FX, lo and behold, they lead the Top500 supercomputer list, also beating out GPUs of the day.

Why have we not been getting bandwidth in CPUs? Is it because SPECint benchmarks do not use much? Or because there is too much branch-heavy code, so we think hundreds of cores are helpful?

Existing machines are ridiculously imbalanced, hundreds of times more compute vs bandwidth than the 1:1 still seen in the 90s. Hence matmul as a way of using/wasting the extra compute.

The AMD MI300a looks like a very interesting development: >5 TB/s shared by 24 cores plus GPUs.