This is very good advice. I did the same preparation, here is the distribution of files before the degraded state: Number of successful reads: 280
Number of IO errors: 0
Successful read files size: sum 82648303047 max 4884066696 average 295172511
then I unmounted the fs, deleted disk 2, echo 3 > /proc/sys/vm/drop_caches, and remounted the fs. sudo umount /mnt/loop
echo 3 | sudo tee /proc/sys/vm/drop_caches
echo 3 | sudo tee /proc/sys/vm/drop_caches
echo 3 | sudo tee /proc/sys/vm/drop_caches
dmesg --human --nopager --decode --level emerg,alert,crit,err,warn,notice,info
kern :info : [Jan22 13:18] tee (215899): drop_caches: 3
kern :info : [ +3.232287] tee (215931): drop_caches: 3
kern :info : [ +0.775697] tee (215953): drop_caches: 3
rm d2.img
sudo mount "$ld1" /mnt/loop
I am surprised that mounting worked without error but I guess the device is still active via losetup. I'm assuming this would be similar to an actual disk failure though, if the device weren't there maybe btrfs will complain and ask to be mounted with the `-o degraded` flag.There was nothing exciting in dmesg kern :info : [ +14.363762] BTRFS info (device loop0): using crc32c (crc32c-intel) checksum algorithm
kern :info : [ +0.000004] BTRFS info (device loop0): using free space tree
Oohh weird... Number of successful reads: 280
Number of IO errors: 0
Successful read files size: sum 82648303047 max 4884066696 average 295172511
sudo btrfs scrub status /mnt/loop/
UUID: a57027e5-feb8-4f58-9022-f5dc0a5c67ac
Scrub started: Sun Jan 22 13:33:49 2023
Status: finished
Duration: 0:00:28
Total to scrub: 77.25GiB
Rate: 2.76GiB/s
Error summary: no errors found
Okay turns out the deleted file is still connected to the loopback device. sudo losetup -d $ld2
sudo umount /mnt/loop
echo 3 | sudo tee /proc/sys/vm/drop_caches
Now we get some interesting stuff in dmesg sudo mount -o degraded "$ld1" /mnt/loop
mount: /mnt/loop: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.
dmesg(1) may have more information after failed mount system call.
kern :info : [Jan22 13:37] tee (222135): drop_caches: 3
kern :info : [ +16.362674] BTRFS info (device loop0): using crc32c (crc32c-intel) checksum algorithm
kern :info : [ +0.000004] BTRFS info (device loop0): using free space tree
kern :err : [ +0.000419] BTRFS error (device loop0): devid 2 uuid 1b352839-f719-499f-b9a7-25ed4d06e2be is missing
kern :err : [ +0.000003] BTRFS error (device loop0): failed to read chunk tree: -2
kern :err : [ +0.000183] BTRFS error (device loop0): open_ctree failed
kern :info : [ +11.713125] BTRFS info (device loop0): using crc32c (crc32c-intel) checksum algorithm
kern :info : [ +0.000004] BTRFS info (device loop0): allowing degraded mounts
kern :info : [ +0.000001] BTRFS info (device loop0): using free space tree
kern :warn : [ +0.000167] BTRFS warning (device loop0): devid 2 uuid 1b352839-f719-499f-b9a7-25ed4d06e2be is missing
kern :warn : [ +0.007647] BTRFS warning (device loop0): chunk 2177892352 missing 1 devices, max tolerance is 0 for writable mount
kern :warn : [ +0.000002] BTRFS warning (device loop0): writable mount is not allowed due to too many missing devices
kern :err : [ +0.000155] BTRFS error (device loop0): open_ctree failed
But we can still mount it as read-only sudo mount -o ro,degraded "$ld1" /mnt/loop
And the results are Number of successful reads: 219
Number of IO errors: 61
Successful read files size: sum 21798190683 max 2122064756 average 99535117
IO error files size: sum 60850112364 max 4884066696 average 997542825
In this test about 26% of data is still fully readable (21798190683 / (21798190683+60850112364)).I also tried another variant of the experiment where I did all of the above but ran this command before removing the disk: sudo rm /mnt/loop/file # a 500 mb file that was included the above tests. I deleted this to give btrfs defrag some room to work
sudo btrfs fi defrag -v -r -czstd /mnt/loop/
and the results are not much better... in fact they are worse 20% lol Number of successful reads: 199
Number of IO errors: 80
Successful read files size: sum 16695157031 max 2122064756 average 83895261
IO error files size: sum 65428858016 max 4884066696 average 817860725
|
> I am surprised that mounting worked without error but I guess the device is still active via losetup.
Exactly. `rm` doesn't actually delete the file contents while the file is still open, it just unlinks it from the filesystem tree. So your loopback-mounted disk is still there and all its contents are still available through /dev/loopX.
> I'm assuming this would be similar to an actual disk failure though, if the device weren't there maybe btrfs will complain and ask to be mounted with the `-o degraded` flag.
If the /dev/loopX device wasn't there then it would be similar to a complete disk failure, yes.
> In this test about 26% of data is still fully readable
It's true that only 26% of data is still fully readable if you account only for files that are fully intact. But also note that about 78% of files were still completely intact.
This is not clear from your comment, but I'm assuming that you are using 4 devices for the btrfs pool as well?
In this scenario, with such a disk configuration and subsequent disk failure you would expect to lose about 25% of files, while the remaining 75% would be intact (especially if the files are small enough)...
But actually, in reality things can be quite better or quite worse, depending on a few factors.
For example:
1. If the free space was fragmented. In such a case, a significant percentage of files might actually be allocated on more than one disk, so you'd lose more files than expected if a single disk fails. Although I can see that on your latter experiment, you've defragged the btrfs filesystem beforehand, so perhaps this is not the main issue.
2. Depending on how btrfs allocates data, if the files are not completely filling all of the disks then they can be heavily skewed towards a subset of the disks.
For example, imagine that each of your disks are 1 TB-sized and your files total less than 1 TB.
In this case, all of your files could be allocated on the first disk only, so losing this disk could lead to losing 100% of your data.
Or for example, if your files are less than 2 TB, they might all be allocated on the first 2 disks only, so losing one of these disks would lead to losing a lot more files than you'd expect if files were evenly distributed across all disks.
But on the other hand, if you'd lose one of the other disks, you might not lose any data whatsoever.
3. Depending on how large files are and how much free space there is on each disk, btrfs might be forced to (or might choose to) span a file across more than 1 disk even on the 'single' profile, even if free space was not fragmented.
4. But of course, more generally, how many files you would lose basically depends on how btrfs allocates disk space across the disks for each file.
These disk space allocation algorithms can be quite more complex than you'd expect from a naive allocator, mostly due to performance reasons.
Unfortunately, I know exactly nothing about how btrfs allocates data, so I can't give you more insight than this, sorry!