people understand they're different, but if bcachefs is out, then that leaves btrfs as the only modern in-tree filesystem, but apparently you can't trust it with important data either.
I've been using btrfs on my NAS for years and have not had any problems. I suspect there are a hell of a lot of people like me you will not hear about because people don't generally get as vocal when things just work.
The venn diagram of "people who want a modern copy-on-write filesystem with snapshots to manage large quantities of data" and "people who want a massive pool of fault-tolerant storage" (e.g. building a NAS) has some pretty significant overlap.
The latter is where BTRFS is still hobbled: While the RAID-0, RAID-1, & RAID-10 modes work absolutely fine, the RAID-5 & RAID-6 modes are still broken, with an explicit warning during mkfs time (and in the manpages) that the feature is still experimental and should not be used to hold data that you care about retaining. This has, and continues to, bite people, with terabytes of data loss (backups are important, people!). That then sours them on every other aspect of ever using BTRFS again.
> If you ignore explicit warnings at mkfs time and then get upset the warning was accurate, you can't really fully blame the file system for it.
Oh, no doubt. I agree.
> Just raid on a lower layer and btrfs on top.
That has its own set of problems. The conventional RAID solution on Linux (MD) also has some pretty terrifying corruption edge cases with RAID-5 and RAID-6 (as I explained in [1]) which will bite you if you're not aware of them and how to work around them.
A robust filesystem purpose-built for the task can only really be found in ZFS.
Won't silent corruption on the raid level be detected by the integrity checks in btrfs? It won't be able to automatically repair it, but it should give errors at least, right?
Yeah, that would be the "error detection at a higher level" (than MD) part. It'd still be on you to pull one drive at a time from the array until the errors go away (then you know which drive has the corrupted block in that stripe, and can remove the mdadm metadata from it and then re-add it to the array so that the kernel forces a clean resync, reconstructing the good block from the parity). Doing the "repair" action in MD would instead rewrite your good parity for now-corrupted data and you would have no means of recovering. MD can't know whether the data is bad or the parity is bad because it doesn't know what the data is supposed to look like; even if btrfs does have a checksum for it, that's on a higher, disconnected layer. All filesystems on top of a parity MD suffer from this same vulnerability; some of them won't even be able to tell you when a file has become corrupted (e.g. FAT32), leading to this corruption being persisted into backups.