|
|
|
|
|
by gmokki
828 days ago
|
|
I have had same btrfs filesystem in use for 15+ years, with 6 disks of various sizes. And all hardware components changed at least once during the fileystsen lifetime. Worst corruption was when one DIMM started corrupting data. As a result computer kept crashing and eventually refused to mount because of btrfs checksum mismatches. Fix was to buy new HW. Then run btrfs filesystem repairs, which failed at some point but at least got the filesystem running as long as I did not touch the most corrupted locations, luckily it was RAID1 so most checksums had a correct value on another disk.
Unfortunately the checksum tree had on two locations corruption on both copies.
I had to open the raw disks with hex editor and change the offending byte to correct value, after which the filesystem has been running again smoothly for 5 years. And to find the location to modify on the disks I built a custom kernel that printed the expected value and absolute disk position when it detected the specific corruption. Plus had to ask a friend to double check my changes since I did not have any backups. |
|
So did you bite the bullet and get ECC, or are you just waiting for the next corruption caused by memory errors? :)