|
|
|
|
|
by cabacon
1471 days ago
|
|
I worked at a supercomputing facility for a few years. The codes are typically decades old, maintained by hundreds of people over the years. By and large, they understand their performance profiles, and are working to squeeze as much out of the code as they can. In addition, the performance engineers tend to be employed by the facilities, not the computational scientists. They're the ones who do a bunch of legwork of profiling the existing code on their new platform, and figuring out how to squeeze any machine-specific performance out of the code. A lot of these codes are time-marching PDE solvers that do a bunch of matrix math to advance the simulation, so the kernel of the code is responsible for a vast majority of the time spent during a job. So it's not necessarily a huge chunk of code that needs to be tuned to wring better performance out of the machine. The parallel communication they do is also to an API, not an ABI - the supercomputing vendors drop in the optimizations in the build of the library for their machine, to take advantage of network-specific optimizations for various communications patterns. If you express your code in the most-specific function (doing a collective all-to-all explicitly, say, rather than building your own all-to-all out of the point-to-point primitive) the MPI build can insert optimized code for those cases. There's some misalignment because the facility will be in the top 500 for a few years, while the code lives on and on and on. If your supercomputer architecture is really out of left field (https://en.wikipedia.org/wiki/Roadrunner_(supercomputer)) it's not going to be super worth it for people to try to run on it without porting support from the facility. |
|