Hacker News new | ask | show | jobs
by phoronixrly 67 days ago
To theal author: did you continue using btrfs after this ordeal? An FS that will not eat (all) your data upon a hard powercycle only at the cost of 14 custom C tools is a hard pass from me no matter how many distros try to push it down my throat as 'production-ready'...

Also, impressive work!

1 comments

What are the alternatives to btrfs? At 12 TB data checksums are a must unless the data tolerate bit-rot. And if one wants to stick with the official kernel without out-of-tree modules, btrfs is the only choice.
I tried btrfs on three different occasions. Three times it managed to corrupt itself. I'll admit I was too enthousiastic the first time, trying it less than a year after it appeared in major distros. But the latter two are unforgiveable (I had to reinstall my mom's laptop).

I've been using ZFS for my NAS-like thing since then. It's been rock solid ().

(): I know about the block cloning bug, and the encryption bug. Luckily I avoided those (I don't tend to enable new features like block cloning, and I didn't have an encrypted dataset at the time). Still, all in all it's been really good in comparison to btrfs.

Additional anecdata:

I've been using btrfs as the primary FS for my laptop for nearly twenty years, and for my desktop and multipurpose box for as long as they've existed (~eight and ~three years, respectively). I haven't had troubles with the laptop FS in like fifteen years, and have never had troubles with the desktop or multipurpose box.

I also used btrfs as the production FS for the volume management in our CI at $DAYJOB, as it was way faster than overlayfs. No problems there, either.

Go figure, I guess.

Could try ZFS or CephFS... even if several host roles are in VM containers (45Drives has a product setup that way.)

The btrfs solution has a mixed history, and had a lot of the same issues DRBD could get. They are great until some hardware/kernel-mod eventually goes sideways, and then the auto-heal cluster filesystems start to make a lot more sense. Note, with cluster based complete-file copy/repair object features the damage is localized to single files at worst, and folks don't have to wait 3 days to bring up the cluster on a crash.

Best of luck, =3

> if one wants to stick with the official kernel without out-of-tree modules

I wonder how could a requirement like that possibly arise. Especially with an obvious exception for zfs.

Bcachefs also fulfills the requirement of checksums (and multi device support).

Also out of tree.

Isn't bcachefs even younger and less polished than btrfs? It does show more promise as btrfs seems to have fundamental design issues... but still I wouldn't use that for my important data.
I don't disagree. Gotta backups for important data either way too!

Just talking about filesystems with checksumming (and multidevice). Any new filesystem to support these features is going to be newer.

I've had both btrfs and bcachefs multidevice filesystems lock up read-only on me. So no real data loss, just a pain to get the data into a new file system, the time it was an 8 drive array on btrfs.

Does it not also eat data though?
I think you could use dm-integrity over the raw disks to have checksums and protect against bitrot then you can use mdraid to make a RAID1/5/6 of the virtual blockdevs presented by dm-integrity.

I suspect this is still vulnerable to the write hole problem.

You can add LVM to get snapshots, but this still not an end-to-end copy-on-write solution that btrfs and ZFS should provide.

lvm offers lvmraid, integrity, and snapshots as one example. It's old unsexy tech, but losing data is not to my taste lately...
lvm only supports checksums for metadata. It does not checksum the data itself. For checksums with arbitrary filesystems one can have dm-integrity device rather than LVM. But the performance suffer due to separated journal writes by the device.
But that is just raid on top of dm-integrity. And Redhat docs omits an important part when suggesting to use the bitmap mode with dm-integrity:

man 8 integritysetup:

       --integrity-bitmap-mode. -B
           Use alternate bitmap mode (available since Linux kernel 5.2) where dm-integrity uses bitmap instead of a journal. If a bit in the bitmap is 1, then corresponding region’s data and integrity tags are not synchronized - if the machine crashes, the unsynchronized regions will be recalculated. The bitmap mode is faster than the journal mode, because we don’t have to write the data twice, but it is also less reliable, because if data corruption happens when the machine crashes, it may not be detected.
I just do not see how without a direct filesystem support one can have both reliable checksums and performance.
> But that is just raid on top of dm-integrity

As I said -- boring tech. Just what I like when not in a mood to lose data.

Good thing all disks these days have data checksums, then!

(50TB+ on ext4 and xfs, and no, no bit rot. Yes, I've checked most of it against separate sha256sum files now and then. As long as you have ECC RAM, disks just magically corrupting your data is largely a myth.)

Less mythic on SSDs than spinning rust, in my experience.

Not particularly frequent either way, but I have absolutely had models of SSDs where it became clear after a few months of use that a significant fraction of them appeared to be corrupting their internal state and serving incorrect data back to the host, leading to errors and panics.

(_usually_ this was accompanied by read or write errors. But _usually_ is notable when you've spent some time trying to figure out if the times it didn't were a different problem or the same problem but silent.)

There was also the notorious case with certain Samsung spinning rust and dropping data in their write cache if you issued SMART requests...

What devices are you talking about, what's the UBER, over what period of time?

RAID and logical block redundancy has scaled to petabytes for years in serious production use, before btrfs was even developed.