| HN Mirror

This is what I do actually, and it works fairly well. Currently I do one MPI process per socket, but mostly just because the OpenMP code I’m calling is a library, and it doesn’t seem to scale well past one modern Xeon worth of cores.

I don’t know what I’d do if I had an old Zen machine, maybe map an MPI process to each chiplett.

My impression is that in the first generation Zen machines, the cost of communicating from one chiplett to another was really quite significant, but they’ve made good enough progress there that it is only something that the really hardcode folks care about.