Hacker News new | ask | show | jobs
by jcranmer 3127 days ago
Once you get above a few cores, the only way to feasibly utilize them is to have data parallelism. And when you need to exploit data parallelism, programming models change and data bandwidth (both in interconnect between CPUs and to/from memory) becomes the limiting factor rather than compute bandwidth.

The only way to fit in a large number of cores on a die is to slim down the cores to remove space-hungry stuff (e.g., out-of-order). This makes the cores weak for single-threaded stuff, which means you're generally firmly in the HPC market and not suitable for personal use. It also requires pretty much developing different programming models, at which point the value-add compared to, say, a GPU seems hard to find. Note that one of the key points of the GPU model is that it oversubscribes the processors with work and swaps threads in and out while they're waiting for memory to complete, something like 16-way SMT.

I'll note that Intel did build something like this (the Xeon Phi), but they appear to be dropping it.