Hacker News new | ask | show | jobs
by zandl 2791 days ago
So your process is limited to the resources of a node right? And coordinating data between jobs is via a shared file system or network messaging between nodes?
1 comments

Depending on the step a LQCD calculation might run on 4, 8, 16, 32, or even more nodes (linear solves and tensor contractions, for example). It's coordinated with OpenMPI and MPI (or equivalent, see my other comment on the software stack). The results from solves are typically valuable and are stored to disk for later reuse. That may prove impractical at this scale.

I'm not sure how big the HMC jobs get on these new machines---it depends on the size of the lattice (which gets optimized for physics but also algorithmic speed / sitting in a good spot for the efficiency of the machine).

Any idea how it compares to Bridges?