Hacker News new | ask | show | jobs
by gnufx 4184 days ago
A distressing number of runs on our HPC system simply aren't reproducible twice in a row anyway. They get repeated until, or in the hope that, they don't deadlock or segv, not that users typically believe in deadlock. They aren't debugged -- it's blamed on supposed system problems, not the code -- and it doesn't seem to worry the people publishing results from them. I doubt our users are unique.

Even for decent code, docker is being over-sold for this sort of thing. Serious large-scale calculations, in particular, simply aren't hardware-independent in practice. Consider a 1024-core PSM MPI job with Haswell-specific code or requiring some GGPU, or a 128-core, 2TB SMP one; you can't run them anywhere. Even if you can package and run in docker at another site, if you don't get the "right" results, what do you do about it if you don't have source?

1 comments

source code should also be included as a matter of course...

i don't think it is an oversell, in the sense that it is still unusual to include source code and experimental setups [at least in my field]. a replicable environment with included source code is a large step forward.

sad as that might be.