|
|
|
|
|
by huhtenberg
1540 days ago
|
|
How did you know it was a change at rest? The only correct way to test for bitrot is to read the data back immediately after it was written and the cache flushed. If it's the same as the original, we know it made it to the disk undamaged. Then re-read it again after some time. If it doesn't match, re-read immediately again, ideally using a different physical memory block. Compare again. If it doesn't match, take the disk to another machine and re-read again. If it doesn't match, only then it's an actual at-rest bitrot... OR it's a drive's firmware bug, because corrupted data must be corrected or it must not be returned at all. |
|
Because we had the checks for it in flight. Also, more often than not these same blocks had been checked before, and found to be fine.
> The only correct way to test for bitrot is to read the data back immediately
No, the only correct way is to read it back after some time has passed. Mis-written data is not the same as bitrot.
> must be corrected or it must not be returned
Every error-correction technique has a limit to how many simultaneous errors it can correct. Beyond that, bits can be flipped in a way that seems valid but in fact is not (detectable by cross-checking with other erasure-coded fragments of the same block on other machines). Just because you haven't seen it doesn't mean it doesn't happen. As I said, and as others have said many times, with sufficient scale and time even the most unlikely scenarios become almost inevitable. Why do you persist in telling me I didn't see what I saw with my own eyes? Are you assuming that my thirty years in storage gave me less understanding or insight regarding these issues than whatever experience (if any) you have?