Hacker News new | ask | show | jobs
by amtadt 695 days ago
Self healing is dangerous because it can potentially corrupt good data on disk, if RAM or other system component is flaky.

Repro: supposedly only good copy is copied to ram, ram corrupts bit, crc is recalculated using corrupted but, corrupted copy is written back to disk(s).

2 comments

> crc is recalculated using corrupted bit

Why would it need to recalculate the CRC? The correct CRC (or other hash) for the data is already stored in the metadata trees; it's how it discovered that the data was corrupted in the first place. If it writes back corrupted data, it will be detected as corrupted again the next time.

Because CRC is in the on-disk data structure, not in the in-ram data structure. It is stripped upon reading to ram, and created upon writing to disk.

That's how bcachefs is designed right now.

No, we carefully carry around existing checksums when moving data.

Page cache is a different story, but doesn't apply to what we're talking about here.

That’s why you need ECC RAM.

Our RAM should all be ECC and our OSes should all be on self-healing filesystems.