|
|
|
|
|
by heisig
1073 days ago
|
|
Thanks for clarifying. I will definitely write down the specifics of setting up distributed computing once it works. However, support for distributed computing will still take some time. The current step is to iron out all the remaining issues of parallelizing within one CPU socket. |
|
To the best of my knowledge, shared memory approaches have been mostly abandoned in the HPC community. It seems none of the codes that went hybrid MPI+OpenMP for example, ever saw substantial performance benefit over pure MPI. At least not enough to justify the increased code complexity. If you search for "hybrid MPI/OpenMP" on Google Scholar you'll see most results are 10-20 years old.
Part of the reason for this is that on modern CPU cores with the amount of cache available, you typically want to keep at least something like 200 000 degrees of freedom per core. That's e.g. a 36^3 grid for u,v,w,p if you're doing fluid mechanics. Then the amount to communicate per core is just 8% of the total data. Furthermore you can easily do other work like compute auxiliary variables while you are waiting on communication.
I will also say that it feels a bit weird to call something "peta-" and "HPC" if using more than one socket is relatively far off into the future. For the randomly-wandering PhD students out there, it would be nice to tell them this up front in the Readme :)