Hacker News new | ask | show | jobs
by tossaway9000 1696 days ago
Are there any means to detect that a cosmic ray caused such a crash without ECC memory? I recall that Drop Box had some issues early on with memory corruption due to how many low-end PCs ran Drop Box.

I recall coming across a more detailed write up a long time ago but still found mention of the issue [0]

> our clients rarely have ECC memory. We see a constant rate of memory corruption in the wild and end-to-end integrity verification always pays off.

[0] https://dropbox.tech/infrastructure/-broccoli--syncing-faste...

1 comments

To detect cosmic rays? I doubt it; detecting them would be possible, but pointless since you couldn't reasonably act on the information - any such solution would probably need Heisenberg compensators since the tools to detect the radiation would most interfere with the radiation you are trying to detect. It's probably better to just shield all the things, or add integrity checks everywhere.

To detect bitflipping errors? Yes. Use cryptographically secure algorithms and protocols that ensure that messages have integrity checks in transit and in memory.

To detect crashes? Probably not - if a bit flips in memory without hardware level error correction that reports the error, there isn't really a way to detect what caused the error.