Hacker News new | ask | show | jobs
by semi-extrinsic 3707 days ago
This. Anything with global interactions (i.e. low flops per byte transferred from memory to core) is poorly suited for GPUs.

There is a hierarchy of HPC-type workloads called "Colella's seven dwarves" that ranks different workloads in terms of being CPU bound or memory bandwidth bound. See also the "roofline model". Both of these heuristics are made to reason about CPUs, but are also effective for thinking about GPUs.