Hacker News new | ask | show | jobs
by sandreas 335 days ago
Well, first of all: I'm not trying to bash BTRFS at all, it probably is just not meant for me. However, I'm trying to gain information it is really considered stable (like rock solid) or it might have been a hardware Problem on my system.

I used cryptsetup with BTRFS because I encrypt all of my stuff. One day, the system froze and after reboot the partition was unrecoverably gone (the whole story[1]). Not a real problem because I had a recent backup, but somehow I lost trust in BTRFS that day. Anyone experienced something like that?

Since then I switched to ZFS (on the same hardware) and never had problems - while it was a real pain to setup until I finished my script [2], which still is kind of a collection of dirty hacks :-)

1: https://forum.cgsecurity.org/phpBB3/viewtopic.php?t=13013

2: https://github.com/sandreas/zarch

6 comments

Yes, my story with btrfs is quite similar- used it for a couple years, suddenly threw some undocumented error and refused to mount, asked about it on the dev irc channel and was told apparently it was a known issue with no solution, have fun rebuilding from backups. No suggestion that anyone was interested in documenting this issue, let alone fixing it.

These same people are the only ones in the world suggesting btrfs is "basically" stable. I'll never touch this project again with a ten foot pole, afaic it's run by children. I'll trust adults with my data.

Ok, thank you. At least I'm not alone with this. However, I'm not too much into it and would not go as far to say it's not a recommendable project, but boy was I mad it just died without any way to recover ANYTHING :-)
probably depends on where the issue is located, but is this not normally the case with encrypted drives?
No. Encrypted drives should be recoverable as long as you have the valid decryption values.

I think it had nothing to do with the encryption layer... the FS layer was the problem.

I ran it on opensuse and it would 100% lock out a core on some sort of crontab tree structure rebalancing (?) .. I mean.. hello? online algorithm? Dynamic rebalancing? scheduled FS restructuring, really? ReiserFS dancing trees from 20 years ago? If thats how they think, "meh, the user just has to deal", no wonder this is how they handle bugs.
I've used it as my desktops main filesystem for many years and not had any problems. I have regular snapshots with snapper. I run the latest kernel, so ZFS is not an option.

That said, I avoid it like the plague on servers, to get acceptable performance (or avoid fragmentation) with VMs or databases you need to disable COW which disables many of it's features, so it's better just to roll with XFS (and get pseudo-snapshots anyway).

In the unlikely case you're running SQLite, it's possible to get okay performance on btrfs too:

https://wiki.tnonline.net/w/Blog/SQLite_Performance_on_Btrfs

I worked on a linux distro some years ago that had to pull btrfs long after people had started saying thats its truly solid because customers had so many issues. Its probably improved since but its hard to know. Im surprised fedora workstation defaults to it now. I'm hoping bcachefs finds its way in the next few years as being the rock solid fs it aims to be.
I hadn't heard of bcachefs, but I looked it up and apparently Linus just removed it from the kernel source tree last month for non-technical reasons.

https://en.wikipedia.org/wiki/Bcachefs#History

He hasn't, yet, but it's anyone's guess what's he's going to do.

Regardless, development isn't going to stop, we may just have to switch to shipping as a DKMS module. And considering the issues we've had with getting bugfixes out that might have been the better way all along.

Yeah what really made me wonder is that I thought I had incomplete and wrong manpages in the recovery sections... examples did not work as described, but I can't remember what it was, I was too mad and ditched it completely :-)
My btrfs filesystem has been slowly eating my data for a while; large files will find their first 128k replaced with all nulls. Rewriting it will sometimes fix it temporarily, but it'll revert back to all nulls after some time. That said, this might be my fault for using raid6 for data and trying to replace a failing disk a while ago.
raid 5/6 is completely broken and there's not much interest in fixing it — nobody who's willing to pay for its development (which includes Facebook, SUSE, Oracle, and WD) uses raid 5/6; you shouldn't have been running it in the first place. I understand it's basically blaming the victim, but doing at least some research on the filesystem before starting to use it is a good idea in any case.

https://btrfs.readthedocs.io/en/latest/Status.html

edit: just checked, it says the same thing in man pages — not for production use, testing/development only.

iirc, btrfs has fixed the issues with raid 5/6 but it requires a breaking change to the on disk format which means you have to create an entirely new partition and copy the data over (you cannot update an existing partition to it). This new on disk format also needs its own testing.

Your point raid 5/6 not being tested heavily by actual users is entirely on point, those enterprise heavy users are only running RAID 10 like configurations.

If you want RAID 5/6, just use ZFS as they have solved all of these issues. I don't know if its due to sheer luck or maybe the fact is that Sun at its time was actually running RAID 5/6 in production (hard drives were not as cheap back then as they are now)?

Have you used 4K sectors with cryptsetup? Many distributions still defaults to 512 bytes if SSD reports 512 bytes as its logical size and with 512 sectors there are heavier load on the system.

I was reluctant to use BTRFS on my Linux laptop but for the last 3 years I have been using it with 4K cryptsetup with no issues.

I used the default archinstall... did not check the sector size, but good to hear it works for you. Maybe I'll check again with my next setup.
FWIW, basically all of arch linux' infrastructure has been running on top of btrfs for several years, and last time I asked them, they didn't have any more problems with it than with any other filesystem.

https://gitlab.archlinux.org/archlinux/infrastructure

> One day, the system froze and after reboot the partition was unrecoverably gone (the whole story[1]).

it looks like you didn't use raid, so any FS could fail in case of disk corruption.

Thank you for your opinion. Well... it did not just fail. Cryptsetup mounted everything fine, but the BTRFS tools did not find a valid filesystem on it.

While it could have been a bit flip that destroyed the whole encryption layer, BTRFS debugging revealed that there was some traces of BTRFS headers after mounting cryptsetup and some of the data on the decrypted partition was there...

This probably means the encryption layer was fine. The BTRFS part just could not be repaired or restored. The only explanation I have for this that something resulted in a dirty write, which destroyed the whole partition table, the backup partition table and since I used subvolumes and could not restore anything, most of the data.

Well, maybe it was my fault but since I'm using the exact same system with the same hardware right now (same NVMe SSD), I really doubt that.

> Well, maybe it was my fault but since I'm using the exact same system with the same hardware right now (same NVMe SSD), I really doubt that.

anecdotes could be exchanged in both directions: I run heavy data processing with max possible throughput on top of btrfs raid for 10 years already, and never had any data loss. I am absolutely certain if you expect data integrity while relying on single disk: it is your fault.

The reliability is about variety of workloads, not amount of data or throughput. It's easy to write a filesystem which works well in the ideal case, it's the bad or unusual traffic patters which cause problem. For all that I know maybe that btrfs complete failure was because of kernel crash caused by bad USB hardware. Or there was a cosmic ray hitting memory chip.

But you know who's fault is it? It's btrfs's one. Other filesystems don't lose entire volumes that easily.

Over time, I've abused ext4 (and ext3) in all sorts of ways: override random sector, mount twice (via loop so kernel's double-mount detector did not work), use bad SATA hardware which introduced bit errors.. There was some data loss, and sometimes I had to manually sort though tens of thousands of files in "lost+found" but I did not lose the entire filesystem.

I only saw the "entire partition loss" only happened to me when we tried btrfs. It was a part of ceph cluster so no actual data was lost.. but as you may guess we did not use btrfs ever again.

> but as you may guess we did not use btrfs ever again.

there are scenarious where btrfs is currently can't be replaced: high performance + data compression.

Sure, I can believe this. Does not change the fact that some people encounter compete data loss with it.

Sadly, there are people (and distributions) which recommend btrfs for general-purpose root filesystem, even for the cases where reliability matters much more than performance. I think that part is a mistake,

OpenZFS does a better job here, at least if you can deal with an out of tree filesystem.
What the hell are you talking about? Any filesystem on any OS I've seen the last 3 decades had some kind of recovery path after any crash. Some of them lose more data, some of them less. But being unable to mount, is a bug that makes a filesystem untrustworthy and useless.

And how would RAID help in that situation?

> But being unable to mount, is a bug that makes a filesystem untrustworthy and useless.

we are in disagreement on this. If partition table entry corrupted, you can't mount without some low level surgery.

> And how would RAID help in that situation?

depending on raid, your data will be duplicated on another disk, and will survive in case of one/few disks corruption.

The partition table gets mostly written only once in the lifetime of a filesystem/disk. So it almost never corrupts during an os crash.

There are a lot of RAIDs and configurations. Some of them may do what you describe, but most don't.

Ok, I know this, not sure what is your point.