Hacker News new | ask | show | jobs
by ajross 2992 days ago
Extra threads means that there's more likely to be something else available to schedule when one thread's pipelines are stalled due to a fetch from DRAM or MMIO, basically. There's very much a diminishing returns beyond two, but for some workloads (big mostly-RAM-resident data sets with poor locality of reference -- consumer databases say) it's worth it. It's unclear to me that 4-way multhreading is going to help any of the benchmarks in this test.
3 comments

All diminishing returns really depend on your workload. We normally would say that for caches, with no point having enormous caches on desktop and typical x86 server processors, but IBM's mainframe CPUs have tons of L3 and even more tons of L4 caches, as well as dedicated cores (in the form of secondary processors) for offloading all kinds of tasks from the main CPUs, each with its own cache architecture.
In the specific case of POWER 8 and 9 the cores are seriously overprevisioned with execution resources and you really need at least 2 threads running in order to make full use of them.
Could be. But the point was more that there's a very thin regime between "waiting for DRAM latency too often" (where more threads can help) and "bound by DRAM bandwidth" (where they won't). The DRAM isn't nearly as parallel as the cores are and saturates really fast.
Knights Corner was another case where you normally needed multiple threads.
In that case it was even more serious. Like the PPUs of the Cell processors, each core runs one instruction every other cycle. If you have only one thread, you effectively have half the throughput.
Some of the models even have 8-way SMT!