|
|
|
|
|
by gnufx
4184 days ago
|
|
A distressing number of runs on our HPC system simply aren't reproducible twice in a row anyway. They get repeated until, or in the hope that, they don't deadlock or segv, not that users typically believe in deadlock. They aren't debugged -- it's blamed on supposed system problems, not the code -- and it doesn't seem to worry the people publishing results from them. I doubt our users are unique. Even for decent code, docker is being over-sold for this sort of thing. Serious large-scale calculations, in particular, simply aren't hardware-independent in practice. Consider a 1024-core PSM MPI job with Haswell-specific code or requiring some GGPU, or a 128-core, 2TB SMP one; you can't run them anywhere. Even if you can package and run in docker at another site, if you don't get the "right" results, what do you do about it if you don't have source? |
|
i don't think it is an oversell, in the sense that it is still unusual to include source code and experimental setups [at least in my field]. a replicable environment with included source code is a large step forward.
sad as that might be.