Hacker News new | ask | show | jobs
by asfgionio 2992 days ago
POWER doesn't have that many more threads. The top-of-the-line is 22-core 88-thread, compared to 32-core 64-thread from AMD and 28-core 56-thread from Intel.

I have no idea what having that many threads per core means for performance.

2 comments

Extra threads means that there's more likely to be something else available to schedule when one thread's pipelines are stalled due to a fetch from DRAM or MMIO, basically. There's very much a diminishing returns beyond two, but for some workloads (big mostly-RAM-resident data sets with poor locality of reference -- consumer databases say) it's worth it. It's unclear to me that 4-way multhreading is going to help any of the benchmarks in this test.
All diminishing returns really depend on your workload. We normally would say that for caches, with no point having enormous caches on desktop and typical x86 server processors, but IBM's mainframe CPUs have tons of L3 and even more tons of L4 caches, as well as dedicated cores (in the form of secondary processors) for offloading all kinds of tasks from the main CPUs, each with its own cache architecture.
In the specific case of POWER 8 and 9 the cores are seriously overprevisioned with execution resources and you really need at least 2 threads running in order to make full use of them.
Could be. But the point was more that there's a very thin regime between "waiting for DRAM latency too often" (where more threads can help) and "bound by DRAM bandwidth" (where they won't). The DRAM isn't nearly as parallel as the cores are and saturates really fast.
Knights Corner was another case where you normally needed multiple threads.
In that case it was even more serious. Like the PPUs of the Cell processors, each core runs one instruction every other cycle. If you have only one thread, you effectively have half the throughput.
Some of the models even have 8-way SMT!
Offhand, I would guess that it's good for I/O-bound tasks, where you can have lots of threads waiting for input that don't need CPU time. A busy database maybe.
You don't need CPU threads for threads that don't need CPU...
The 8-core version only has 8 (each) ALUs, LSUs, and vector units. https://en.wikipedia.org/wiki/POWER9#Core If each core has 4 threads "running" on it, some of them are not going to be executing.
Still: the kernel won’t schedule threads that are waiting on I/O i believe.
Ok that's an interesting point. So they'd have to be waiting on a fetch instruction for the time not to be totally wasted?
Blocked tasks are not scheduled until unblocked.