| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by 2OEH8eoCRo0 1578 days ago

> The logical successor was supposed to be btrfs, but that project IMHO may never be ready for production use.

https://btrfs.wiki.kernel.org/index.php/Production_Users

Facebook deployed it on millions of servers. Is that production enough? Synology NAS devices also use it.

3 comments

ComputerGuru 1578 days ago

Facebook doesn’t use it’s servers the same way we use our computers. They image machines in and out of existence. They don’t have file systems going through power loss on a weekly basis. They don’t upgrade the kernel on existing installations. They don’t expand their storage after the fact. If their machines fail, they don’t care - they’re completely fungible.

5e92cb50239222b 1578 days ago

> If their machines fail, they don’t care

This myth is being perpetuated despite btrfs devs (who work at facebook) stating the exact opposite many times over.

Every FS corruption and weird behavior is put aside and investigated. They very much do care.

https://lwn.net/ml/fedora-devel/03fbbb9a-7e74-fc49-c663-3272...

Please read the whole thread before repeating this nonsense, or at least every email sent there by Josef Bacik.

See also:

https://lwn.net/Articles/824855/

https://lwn.net/Articles/824620/

ComputerGuru 1578 days ago

> Every FS corruption and weird behavior is put aside and investigated. They very much do care.

Just because you and I are using different meanings of the word "care" doesn't mean the point isn't valid. They "care" in that they would like to know what went wrong and study it further. They don't "care" in the sense that they suffered no real harm and no stakes were riding on any one particular server that failed. It's not just a matter of having a backup/redundancy, it's about having automated systems (or even just standard procedures that are being executed on a daily basis at that scale one way or the other) that take care of these failures. So even in production, "regular" btrfs users might have backups so "no lasting damage" would be incurred, but that's hardly the same as openly volunteering themselves for risk.

That's all besides the main point: Facebook is deploying "known good" configurations. They're using a very select subset of features. They're not trusting changed btrfs features/implementations being correct or, as was my experience, worrying about less-used/tested codepaths leading to data loss.

spookthesunset 1578 days ago

As a tl,dr:

“Also keep in mind we pay really close attention to burn rates for our drives, because obviously at our scale it translates to millions of dollars. Btrfs has improved our burn rates with the compression, as the write amplification goes drastically down, thus extending the life of the drives.”

As with anything it comes down to money. Yes a machine going down doesn’t impact the cluster but it does impact their wallet. Every failure of a disk costs money and on the scale of the big boys that can add up to big money.

So while “the system” doesn’t care about drive failures the accountants and CFO’s absolutely care.

ComputerGuru 1578 days ago

Just pointing out that "caring about physical drive failure" and "caring about disk corruption or data loss" are completely independent and the latter does not directly equate big money (as there are already systems and SOP in place to deal with handling failed servers). Btrfs isn't notorious for actually frying disks, just the data on them.

cestith 1578 days ago

Do they care about the FS just silently eating data? I ask because btrfs has been known to do that. Sure, you're not replacing the drive, but you're probably wiping the VM's disk image and creating a new one.

2OEH8eoCRo0 1578 days ago

OP said production use. Can you define production use then?

lanstin 1578 days ago

The thing of recreating VMs a lot instead of upgrading or keeping them a long time is production use. The whole point of VMs, aside from not taking 3 months to order and provision, is that you can put the "long-term maintenance of a disk and OS" cost to zero and just recreate from SoR (hopefully git) whenever something needs to change. If you are editing state on persistent VMs, you are missing some really nice features of VM based deployment. It's like containers but more well understood and possibly more cost efficient (depending on the code).

api 1578 days ago

Lots of people seem to turn up their nose at btrfs. Is there a reason for that? Was it perhaps launched before it was really ready and people still remember early versions?

Syonyk 1578 days ago

> Is there a reason for that?

I can give you mine. I was working with a Raspberry Pi 3, and using a USB SSD. It's a USB2 link, so a bit choked, and I figured, hey, filesystem compression can help here, btrfs supports it, great! And it helped - you could get "real world" disk reads a good bit faster than the USB2 bus speed.

Until one day, I rebooted, and it didn't come back up. Analysis on another system was that the btrfs filesystem was just... toast. I've no idea what happened, I found some stuff that said "Oh, uh... don't use btrfs over USB, it kinda breaks in some cases...", the recovery tools couldn't even decide that the filesystem was a btrfs filesystem, and, nope.

I put data on the filesystem, I expect it to come back. btrfs broke that guarantee with a Pi full of data (nothing too important, they're just scratch systems and light desktops), so... I now stick to the boring things like ext4 that have been exceedingly well proven. Is it the best filesystem out there in terms of features? Certainly not. Am I pretty darn sure that I'm not going to trip some edge case and totally scramble the filesystem? Yes, and that's what I care about.

adastra22 1578 days ago

Lots of us got burnt with data loss and aren’t willing to give it a chance again. Maybe it’s better now? I don’t have a reason to give it a second chance when there are plenty of stable alternatives that have saved my ass I’m the past instead of telling me I’m SOL.

ak217 1578 days ago

That's exactly it. I've used btrfs in production since Ubuntu 10.04, at scale since 12.04, and had nothing but great experiences with it - especially with the seed volume functionality, which allowed me to build the foundation for a major container-as-a-service platform before Docker was a thing. btrfs never lost our data, but I've also seen way too many btrfs kernel panics that were clearly related to insufficiently mature filesystem code, and I can understand people who did lose data, got burned and never want to trust btrfs again.

j16sdiz 1578 days ago

In their earlier days, the ENOSPCE bug corrupts the filesystem.

If you do a heavy random write workload, it fills up the disk pretty quickly and require a re-balance _before_ ran out of space.

Of cause you can do nocow on those files, but than it lost all the checksuming/snapshotting features.

patrakov 1578 days ago

For me, it was https://bugzilla.kernel.org/show_bug.cgi?id=85581. Yes this endless-write loop is long fixed, but, given that something with 99% similar symptoms has surfaced in kernel 5.16 (or was this original bug not fixed properly?), I would say no.

TheCondor 1578 days ago

It is a complex beast. It needs some maintenance and performance will degrade without it.

I've never lost data to it, I've never tried the soft RAID modes it has though, but I've experienced it making a system almost unusably slow. SUSE out of the box with it automates a lot of it and it's pretty remarkable. Transactional mode if you want it seems like a game changer for some servers and the snapper stuff has saved my bacon a couple times. It's getting there but like I said, it needs some maintenance and just formatting a partition with it is likely the wrong way to experience it.

__david__ 1578 days ago

For me, when I tried btrfs (which was about 10 years ago now) I discovered it was extremely slow. And not like 50% slower—when I switched to ext4 or xfs on the same disk with the same data I was getting a 10x or so speedup.

MrDOS 1578 days ago

AFAIK it's not so bad in single-device use-cases. I think most of the more recent failures I've heard about have all had to do with Btrfs RAID. The prevailing wisdom still seems to be that if you want to use RAID, use an md soft-RAID device or LVM under your single-device Btrfs filesystem.

cestith 1578 days ago

RAID, especially 5 or 6 was my main concern, yes. If I'm using hardware RAID or a soft RAID under the FS, much of the promised benefit of btrfs is gone anyway. I can add to storage pools with ZFS or expand an LVM set, too, but what does using btrfs on top of anything buy me that ZFS, bcachefs, or something like f2fs does not?

MrDOS 1577 days ago

> but what does using btrfs on top of anything buy me that ZFS, bcachefs, or something like f2fs does not?

Well, inclusion in mainline kernels is the big one over ZFS and bcachefs, I guess.

I haven't seen F2FS before, so I'm commenting on the basis of 30 seconds of Googling, here, but it doesn't look like it supports either copy-on-write or snapshots, which are the big selling points I've heard for continuing to use Btrfs on top of a device manager.

pnutjam 1578 days ago

Yes, the only real problems are edge cases. I use it all the time.

duskwuff 1578 days ago

All problems are edge cases, to some degree or another. The only real question is how far out those edges are, and whether users are likely to bump into them.

craftkiller 1578 days ago

Edge cases like Raid5/6 which had the write hole issue approximately a decade after btrfs was released. At some point you say "This filesystem has lost so much of my data that I will never return to it."

pnutjam 1578 days ago

That's pretty old news. It's been problem free for a long time and it's very well documented where you might have issues.

bastardoperator 1577 days ago

Burn me once, shame on you, burn me twice, shame on me. If you purchased a new ford and that car fell apart a week later, would you ever buy a ford again? Some will, most wont.

tinus_hn 1577 days ago

A better analogy would be if the car got you in an accident. I don’t care if something breaks quickly as long as that means it can be returned or replaced.

gh02t 1578 days ago

I agree, I love BTRFS and have used it for ages, including some small scale production systems. But I know it still has some edge cases as you mention, which made me wonder: what is the impediment to having those cases fixed? BTRFS has been around long enough and even has some decent commercial support from a few vendors, so it seems like we can't just discount it as "it's open source and nobody is motivated to fix those long tail problems." Is there some kind of design issue that makes them hard?

tinco 1578 days ago

edit: sorry, cheap shot at Facebook. I have no idea why BTRFS edge cases are not being fixed.

What I do know is that ZFS recently released a feature specifically for the hobbyist/frugal community. The feature allows you to grow an existing RAID array, something a financially sound business would never do. So no customer of anyone supporting ZFS would ever use this, and it took significant effort of ZFS developers to implement this. Not to mention that introducing feature potentially introduces weird behaviour in ZFS that might endanger its (reputation of) stability.

I'm super happy with it, (as my company was not in fact financially sound when we invested in our on-premise storage hardware), but if I was CEO of ZFS I'm not sure I'd sign off on it.

alophawen 1578 days ago

Sarcastic comment adding nothing to the discussion. How rare.

Brian_K_White 1578 days ago

I could already grow mdraid and reiserfs forever ago.

tinco 1578 days ago

Yes.. but you couldn't grow ZFS. I don't understand what your point is.

cestith 1578 days ago

What are the equivalent edge cases in XFS, ZFS, ext4, Reiser4, Reiser5, bcachefs, or f2fs that make btrfs worth considering on a level playing field?

pnutjam 1578 days ago

The wiki has everything you need.

https://btrfs.wiki.kernel.org/index.php/Status

https://btrfs.wiki.kernel.org/index.php/Gotchas

cestith 1577 days ago

That is very informative about the edge cases for btrfs. My question was what are the edge cases in the other filesystems which put them on a level playing field with btrfs considersing it still has so many.

kzrdude 1578 days ago

Is ENOSPC still included as one of those edge cases?

user-the-name 1577 days ago

I really do not want to use a file system that has problems at edge cases. A file system needs to be incredibly stable.

nwmcsween 1578 days ago

BTRFS was marked stable in 2012 yet it still has abysmal performance compared to zfs, ext4, xfs, etc.

cestith 1578 days ago

Are they using the RAID 5 or RAID 6 code in it? Because that was declared unfit for use well after we were all advised their filesystem was ready for prime time. Then it corrupted and lost data in situations that other file systems did not.

I've heard RAID 1 and RAID 10 modes are safer, but after the FS corrupted my data I haven't really had a lot of trust in it or the people who say again that it's ready for serious use.