Hacker News new | ask | show | jobs
by jbeda 3926 days ago
When operating at scale, you will, once in a while, have corruption. Even if you use ECC RAM, once in a while you'll have a double bit flip. And it doesn't look like Backblaze uses ECC (https://www.backblaze.com/blog/storage-pod-4-5-tweaking-a-pr...) despite good evidence that ECC is necessary (PDF: http://static.googleusercontent.com/media/research.google.co...). Even if you do have ECC, you'll once in a while have a bad NIC that with HW offload that will corrupt the TCP stream silently.

This is all rare, but it does happen. This is why the GCS team wants to know if you are seeing corruption on file upload as it might be some bad hardware failing in a non-obvious way.

1 comments

I just spent 10 or so minutes and it looks like they do use ECC, and per https://news.ycombinator.com/item?id=2786695 see ECC corrections reported in their log files.