Hacker News new | ask | show | jobs
by thoroughburro 618 days ago
> The largest failure was with btrfs — after a reboot, a 50 TB filesystem (in mirror, for backups) simply stopped working. No more mounting possible. Data was lost, but I had further backups. The client was informed and understood the situation. Within a few days, the server was rebuilt from scratch on FreeBSD with ZFS — since then, I haven’t lost a single bit.

As someone who admins a lot of btrfs, it seems very unlikely that this was unrecoverable. btrfs gets itself into scary situations, but also gets itself out again with a little effort.

In this instance “I solve problems” meant “I blow away the problem and start fresh”. Always easier! Glad the client was so understanding.

5 comments

> As someone who admins a lot of btrfs, it seems very unlikely that this was unrecoverable.

As someone who used it all day every day in my day job for 4 years, I find it 100% believable.

I am not saying you're wrong: I'm saying, experiences differ widely, and your patterns of use are not be universal.

It's the single most unreliable untrustworthy filesystem I've used in the 21st century.

The first time I tried it out about 4 years old, I bricked it within a few days!! It was on a new (to me) Linux distro or maybe an existing one but I heard it was cool and the snapshots sounded neat.

I stayed away for a while but have it again on a Garuda install. I never completely give up on a technology, I hope they get it together.

As someone who has used it in my day job since 2014, I find it around 5% believable. I've had nasty performance issues on old kernels, but never a single instance of unrecoverable data loss, and I've run it in plenty of pathological cases. Experiences differ
It is the default root filesystem on SLE and openSUSE, as well as Garuda, SpiralLinux, GeckoLinux, siduction, and others.

(I name these because all use snapper to provide transactional packaging and installation rollback. This is less relevant to other distros which use Btrfs but do not offer transactional packaging, e.g. Fedora or Oracle Linux.)

The snapper tool makes a pre-install snapshot before packaging operations. It can't get a reliable estimate of available space before doing this, because `df` does not work. It returns an estimate which is not reliable.

Result, snapper or packaging operations can fill the root filesystem.

Attempted writes to a full Btrfs volume will corrupt it in my fairly extensive direct personal experience.

And the `btrfs repair` tool does not work and can't fix a corrupted volume. Both the Btrfs docs and SUSE docs tell you not to run it: this is not my opinion, it's objectively verifiable info.

This caused total OS self-destruction 2-3 times per year, on 2 different desktops and 1 laptop, for 4 years.

The official guidance is: have a really big root partition and do not keep `/home` separate.

However, given that it's only having a separate /home partition (formatted XFS or ext4) that allowed me to reinstall and keep working, I refuse to do that.

When a FS repeatedly collapses and self-destructs on me, no, I will not hand it even more of my data to destroy. That would be irrational.

I know why. I know what steps could hypothetically avoid this, but for other reasons those are not desirable to me.

But this is not acceptable FS behaviour for me. The demands of Btrfs advocates of what I should do are not reasonable to me.

I am happy to accept that other configs would not exhibit this, but the thing is this:

1. A core USP of SUSE distros is transactional packaging

2. Their transactional packaging needs snapper

3. Snapper needs Btrfs

4. Because of design flaws in Btrfs, snapper can corrupt the OS partition

5. Over some 15 years these problems in Btrfs have not been fixed

That makes me think they can't or won't fix it.

That is an unacceptable price to me. My choices are to risk a self-destructing distro, or to risk all my data on a fragile FS, or to forego the distro's USP.

None of these are acceptable prices to me.

Others' mileage varies. That's fine. It's a free market. Go for it. Enjoy.

OpenSUSE is a good distro with some great tech, but the company needs to study rival distros more, learn its own weaknesses, and fix them.

(This is true of most distro vendors.)

> It's the single most unreliable untrustworthy filesystem I've used in the 21st century.

I think the “experiences differ widely” point makes sense with this comment too. Synology uses btrfs on the NAS systems it sells (there’s probably some option to choose another filesystem, but this is the default, AFAIK). If it were to be “the most unreliable untrustworthy filesystem” for many others too, Synology would’ve (or should’ve) chosen something else.

Synology only uses btrfs in single-disk mode and implements RAID-1 functionality using its own patched version of mdadm to side-step the gotchas of native btrfs raid1.
What 'gotchas' exactly?
None for raid 1. They do it for raid 5/6 if you're a crazy person and want to run parity raid in 2024.
FWIW, this wasn't always the case. I recall that BTRFS reliability was much different, say, 10–15 years ago. The post touched those ancient times as well, so that isn't that much of a stretch.

Around that time, SLES made btrfs their default filesystem. It caused so many problems for users that they reversed that decision almost immediately.

I was pleased with my home lab btrfs, had a 12TB raid1, and the PSU rail connected to the backplane sometimes would go down under load. Many scary errors but never lost anything. Took me 2 months to debug and replace the PSU
If btrfs knows the data is intact, shouldn't btrfs recover automatically?
Why do people use btrfs and similar filesystems for production use? They are by no means dumpster fires. But the internet is littered with stories of "X happened, then I realized Y & that I wasn't getting my data back"
Btrfs has some nice features - e.g. compression and snapshots, which i didn't knew i'd even like before using them. Not only they have saved me a few times from bad updates ("saved" in the sense that i was able to pretty much instantly revert, it saved time, i wouldn't lose anything even without btrfs), but they also help with things like "i'm going to run this script to process 29837894293 files - and the script might have some bugs in it, so i want to be sure i wont lose anything" (i.e. make snapshot, run script, check results, compare snapshot with current state to ensure nothing is lost, delete snapshot). Snapshots are also useful for diffing FS state, e.g. before and after installing some program.

As for the stories, AFAICT often the reason is that the user didn't know they could get their data back - or they are stories from many years ago when btrfs was buggy, but AFAIK those issues have been long solved (i think some specific case with some RAID setup still has issues but this is hearsay and AFAICT from the same hearsay, that setup isn't really necessary with btrfs in the first place).

Using btrfs is more complicated than using ext or something similar, especially since most tools that deal with files/filesystems are made only with ext-like features in mind - to the point where sometimes i wonder what the point is and i'm considering switching to ext3 or ext4, but then i remember snapshots and i'm like, nah :-P.

Nearly all of the stories of unrecoverable data loss I've read involve someone discovering they have a problem, then trying the traditional ext/xfs recovery techniques before reading the docs, and thereby destroying their fs.
Facebook (well, Meta, I guess) is famously a big user and developer of btrfs. It seems to work just fine for them.
> Facebook (well, Meta, I guess) is famously a big user and developer of btrfs. It seems to work just fine for them

I really, really, really wish people would STOP with the whole "it works for $SilconValleyCorp so it must work for me" or "$SiliconValleyCorp does it, so I must".

It only leads to disappointment in the case of the former and wholly un-necessary over-engineering in the case of the latter.

    (a) You do not know *how* or *where* Facebook use BTRFS
    (b) Even in the unlikely event they use it "everywhere", they have far more redundancy on every layer than you will ever have.  So they don't care if a random BTRFS instance borks itself.
    (c) Facebook probably employ the guy who invented BTRFS and an army of kernel developers on top of that .... how much in-house support do you have for BTRFS ?
As far as I am concerned, the fact that they STILL have not fixed RAID5 in BTRFS says everythng you need to know.
> You do not know how or where Facebook use BTRFS

(S)he does, their employees explained it many times. They're very public about it.

> So they don't care if a random BTRFS instance borks itself.

They do, according to Christ Mason (IIRC) they investigate every instance of btrfs corruption, regardless of how unimportant the machine and data were. They're not any more frequent than with any other filesystem.

> Facebook probably employ the guy who invented BTRFS and an army of kernel developers on top of that

Not an "army" (only a few developers), but you're correct here.

> they STILL have not fixed RAID5 in BTRFS

Why would they? It's a niche technology that's only interesting to a few home users. I am a home user and have no use for it (or any of the alternatives like raidz).

They haven't fixed raid 5 because no one serious uses parity raid in this decade. Try recruiting an Open source dev to write something that useless...
Look, I simply highlighted a major user of btrfs. Sorry if you have some complex emotions about them. But for some reason, I doubt you'd say the same thing when someone mentions Netflix using FreeBSD.

>(a) You do not know how or where Facebook use BTRFS

Their engineering team has posted a few of their use cases.

>(c) Facebook probably employ the guy who invented BTRFS and an army of kernel developers on top of that .... how much in-house support do you have for BTRFS ?

Uh, about as much as any other file system? Those changes and improvements are upstreamed to the kernel anyway. It's not like Facebook has some sort of special version of btrfs they are using.

>As far as I am concerned, the fact that they STILL have not fixed RAID5 in BTRFS says everythng you need to know.

As far as I know, the issue with RAID5 in btrfs is highly complex and it would take quite a bit of dedicated effort to make it work. I suppose it's a architectural shortcoming of btrfs. But then again, it's RAID5, a/k/a something only shoestring hobbyists really care about. Hence why no one is bothering to make it work in btrfs.

At the end of the day, btrfs is perfectly fine for home users and workstations. ZFS beats it out on servers, that's fine. Traditional filesystems are not the end-all be-all of storage anymore. No one has made a better ZFS because the industry has moved on to things Ceph, vSAN, AzureHCI, etc.

> But for some reason, I doubt you'd say the same thing when someone mentions Netflix using FreeBSD.

Actually, I would. Not because I'm a BSD hater, because we actually use a lot of BSD at $work.

But instead because I reckon I could safely win a bet with you that Netflix do not use the vanilla version of FreeBSD.

Most people I know would agree with me that the secret sauce will forever stay secret.

Sure, without a doubt Netflix contribute stuff back to FreeBSD. But I betcha it's not ALL the stuff. :)

The same goes for other famous FreeBSD users, e.g. Juniper Networks.

I'm happy to recommend FreeBSD to people, but if they're looking for Netflix or Juniper level network performance, they'll need to know they'll have to do the donkey work themselves because there's not a cats chance in hell they'll magically get it "out of the box".

Bet accepted.

Netflix runs -CURRENT and they have been very vocal about their approach to both developing and running a vanilla FreeBSD tree, i.e. https://freebsdfoundation.org/netflix-case-study/

In my personal experience, if you reach out to them, they will likely help you where it relates to their work and expertise including collaborating on works in progress that are not ready for main. You do have to configure and use appropriate software to get the same numbers as they, i.e. sendfile and some sysctls but any local changes they have would not be material to posting similar performance numbers.

How do I collect my winnings :)

The internet is littered with imbeciles who omit their own blunders.