Hacker News new | ask | show | jobs
by gnufx 1673 days ago
Er, no. I do that stuff (well, I'm not clever enough for C++ generally, and it would be OpenMP rather than plain pthreads) on the sort of nodes that Sierra uses. However they mostly use the GPUs, for which POWER9 has particular support. Then I can tell there isn't currently any GEMV or FFT running on this system, and not "all the time" even on our HPC nodes.

While it isn't necessarily clear what peak performance means, MKL or OpenBLAS, for instance, is only ~100% of serial peak on large DGEMM for a value of 100 = 90; ESSL is similar. I haven't measured GEMV (ultimately memory-bound), but I got ~75% of hand-optimized DGEMM performance on Haswell with pure C, and I'd expect similar on POWER if I measured. Those orders of magnitude are orders off, even for, say, reference BLAS. I don't know why I need Python, but the software clearly exists -- all those things and more (like vectorized libm). You can even compile assorted x86 intrinsics on POWER, though I don't know how well they perform relative to on equivalent x86, but I think you're typically better off with an optimizing compiler anyway.

I've packaged a lot of HPC/research software, which is almost all available for ppc64le; the only things missing are dmtcp, proot, and libxsmm (if libsmm isn't good enough).

1 comments

You start with BLAS being a factor 2 off, and then go to PETSc, and are another couples of factors off, and then the actual app the scientist wrote, which many use all of the above and the kitchen sink, where every piece and the pieces they use are all a couple of factors off, and then your scientist app is at 0.01% of peak.

If you have used Sierra since the beginning, we have seen significant performance increases over the years, because the people using it have actually been discovering and then either getting IBM to fix, or fixing themselves, most of the software.

Compared with Power 10, I'd say that Power 9 is "mainstream" (many clusters available), and from the Power 9 CPUs in existence, IBM's are the most mainstream of them all.

Take the Power 10 ISA, build your own CPU that significantly differs from IBM's, and good luck with optimizing all the software above. It can be done, and dumping it on a couple of HPC sites where then scientists and staff won't have any change but to use it for 4-6 years is a good way to get that done.

But for a private company that just wants to deliver value, ARM is just a much better deal, cause it saves them from having to do any of this.