Hacker News new | ask | show | jobs
by janwas 1511 days ago
One interesting anecdote is that HPC planning for exascale included significant concern about machine failures and (silent) data corruption. When running at large enough scale, even seemingly small failure rates translate into "oh, there goes another one".