Hacker News new | ask | show | jobs
by InTheArena 2341 days ago
I went on a quest a few years ago, thinking it would be good for the industry to standardize on a single next generation filesystem for UNIX. I started with ZFS on linux since that seemed to have the most vocal advocates. That lasted about a half year, until a bug in the code resulted in a completely corrupt disk, and I had to restore 4TB of data over a month from offside backups. That plus the licensing confusion around ZFS has made it impossible for ZFS to be the defacto choice.

I went down the BTRFS path, despite it's dodgy reputation when netgear announced their little embedded NASes, and switched my server over to it. The experience was solid enough that I bought high-end synology and have had zero problems with it.

6 comments

Btrfs is the only FS I used that resulted in complete FS corruption losing nearly all data on disk, not once, but 3 times.

After that, none of the features like compression, snapshots, COW or checksums meant anything to me. I'm much happier with ext4 and xfs on lvm.

It seems a lot of people have these stories, and then people like me and OP who have had btrfs survive the most fucked up situations (I've had a btrfs nas built on "random drives I've had lying around" and abused it for 5 years and had 0 bugs at all).

I'm not sure what causes it, but there seems to be an effect where btrfs loves you or hates you and few people with mixed experiences regarding data loss. One possible cause is distro choice tends to be per person and how up to date said distro keeps it's kernel. But, I'm not sure.

> It seems a lot of people have these stories, and then people like me and OP who have had btrfs survive the most fucked up situations (I've had a btrfs nas built on "random drives I've had lying around" and abused it for 5 years and had 0 bugs at all).

Why wouldn't you expect it to survive that? Is there a particular reason to believe those drives are broken? I.e., are they older consumer drives known to lie about cache flushes? do they have bad sectors? How have you abused it? What kind of load? Did you fill the filesystem (which another commenter mentioned seems to be a common element of most sad btrfs stories)? did your system frequently lose power while under write load?

Lacking more details, I'd just say one user experiencing 0 bugs in 5 years should be completely unremarkable. I expect filesystems to be very reliable, so a lot of people having stories of corruption means stay away from btrfs. Having some people with stories of no corruption doesn't really move the needle. Together, these stories still mean stay away from btrfs!

That's hyperbole, it can't be taken seriously. OpenSUSE uses Btrfs by default, if there were more problems outside what's expected by md+LVM+ext4 (or XFS), which is the feature comprised by Btrfs and then some, they wouldn't have made the on-going investments they have. Facebook has been using it in production with thousands of installations for years.

You want details from people experiencing zero problems, but you don't ask for details from people who are? That's a weird way to go about conducting the necessary autopsies, to discover and fix bugs.

Anyway, I monitor the upstream filesystems lists, and they all have bugs. They're all fixing bugs. They're all adding new features. And that introduces bugs that need fixing. It's not that remarkable, until of course someone suggests only one file system is to be avoided, while also providing no details, but depends on conjecture.

I asked RX14 why they called out their lack of problems as remarkable ("survive the most fucked up situations"). It sounds strange, as I mentioned.

I don't need to ask people who've had problems because I've had them myself, in unremarkable circumstances, a while back. I'm sure I could find reports on the mailing list as well, in which others have already asked for details.

In my experience, btrfs is very fragile in power loss or kernel crash/panic scenarios. It very consistently causes soft lockups on file read/writes after power loss until you run a `brtfs check --repair` on it. My experience is mostly on Arch, so it's not a case where it's out of date and missing patches.
Sounds like hardware problems in the storage stack. Btrfs developers contributed the dm-log-writes target to the kernel, expressly for conducting power loss tests on file systems. All the file systems benefit from this work. https://www.kernel.org/doc/Documentation/device-mapper/log-w... And Btrfs is doing the right thing these days.

I recently conducted system resource starvation tests where a compile process spun off enough threads to soak the system to the point it becomes unresponsive. I did over 100 forced power off tests while the Btrfs file system was being written to as part of that compile. Zero complaints: not on mount, not on scrubs, not with btrfs check, and not any while in normal operation following those power offs.

If you want to complain about Btrfs, complain about the man page warning to not use --repair without advice from a developer. You did know about that warning, right?

100% was not a hardware problem. Works fine on other filesystems ️
That's an inadequate answer because it rests on other file systems assuming the hardware is working reliably. Btrfs and ZFS don't make such assumptions, that's why everything is checksummed. They are canaries for hardware, firmware, and software problems in the storage stack that other filesystems ignore.
This was my experience. We had a brief power outage at work and my btrfs (root) partition was toast. Spent a whole day rebuilding my system afterwards. Will definitely not go that route again.

The only difference is that none of the repair tools were able to recover the filesystem, but I was able to dump the files themselves to a new disk to recover them. Really not sure why, it was very strange.

I ran btrfs on a laptop. 2 things.

Once I ended up with a bunch of zero length files (presumably metadata was written before content?).

I also, multiple times ended up with errors related to full drives despite by drive not being full. Deleting snapshots seemed to help.

Then I went to a zfs fs on root and never had another problem.

Since a year I literally daily turn of my machine by pulling a plug (home automation turns off all plugs at midnight to make me go bed ;).

My quite large 1tb multivolume, multisnapshot BTRFS fs never had any problems.

And it's quite aggressive cfg (big fs commit).

P.S. I do have backups though.

Ugh. You are testing your home the Netflix way [1] :-)

Why not putting poweroff in a cron task a bit before midnight so you don't uselessly risk hosing your file system? You can always restore your backup but it takes time!

[1] https://arstechnica.com/information-technology/2012/07/netfl...

I think the probable cause is that it's not common bugs that cause the corruption but uncommon ones. Most of the time, they work fine. But you really want a stronger guarantee than that out of your filesystem.
Historically, the biggest bugs in btrfs were when you came close to filling up the filesystem. For the longest time, you'd get -ENOSPC (no space left) even when you had many Gb of space left due to really bad metadata and block level space usage.
I'm a huge Mac fanboy, but APFS really kicks me in the teeth sometimes. Aside from things like snapshots, clones, etc. not being accessible to users (well, not really), or being able to create subvolumes at specific mount points which forget those mount points next reboot, it had an extremely strange behavior (possibly relating to snapshots/CoW?) where once it was full, it stayed full forever until you rebooted.

Basically, any time a runaway process filled my disk, I just had to hard-reboot and hope I didn't have any unsaved work or state that I needed to preserve.

Really makes me hope that Apple is going to further extend APFS to not just be baby's first CoW volume-management filesystem.

> it had an extremely strange behavior (possibly relating to snapshots/CoW?) where once it was full, it stayed full forever until you rebooted.

Do you have Time Machine enabled? I think it uses snapshots, which explains why the filesystem stays full. I've hit this myself and was initially surprised to see rm not improving matters (possibly even making it worse) but it makes sense with snapshots. The working on reboot was a surprise. I'd put off fixing the machine for at least a week, and when I went to actually fix it, it was quite anticlimatic to just reboot and have it work. Maybe it checks for this condition on reboot and dumps Time Machine snapshots if so.

That was the less scary part of my macOS filesystem integrity worries. My full disk started when it was staging a full Time Machine backup after I got a dialog saying:

> Time Machine completed a verification of your backups on "my.nas.address". To improve reliability, Time Machine must create a new backup for you.

...for the Nth time. I don't know for certain if the problem is with Apple's software or with my NAS's (Synology) but these backups are clearly not as reliable as one would hope...

Let's not forget about various performance issues which were exacerbated by "low free space" conditions (i.e. after you filled the volume beyond 80 % these started to pop up). A file system that will sometimes go down well into the fractional IOPS range is not very useful.

Some of these are fixed by now, though.

"the biggest bugs in btrfs were when you came close to filling up the filesystem" :)

I used to read every email on btrfs-devel for a year or so.

This is my experinence too. Works great with lots of free space, as soon as space gets tight, performance deteriorates really fast. Nevertheless, for me it has been worthy.
There's a good Bryan Cantrill talk about that.[1] The gist is that eventually, when you throw enough resources at a problem, all that's left are the really uncommon problem and bugs, and this is specifically what you get in the data path (including drive firmware) where things get harder and harder to figure out as the code gets more hidden and obscure.

As with all his talks, you can expect it to be quite entertaining as well as informative and historical (if from his POV).

1: https://www.youtube.com/watch?v=fE2KDzZaxvE

Personally I think that in the case of a CoW filesystem, bugs which cause corruption should be very uncommon because of the very nature of the CoW mechanism, especially if coupled together with data checksums as publicized in the case of BTRFS.

If things still get trashed then I tend to think that the very foundation of the FS is bad.

But maybe I'm just naive :)

> I'm not sure what causes it, but there seems to be an effect where btrfs loves you or hates you and few people with mixed experiences regarding data loss.

I tried, I really tried to like btrfs.

On the servers/workstations I’ve had few serious issues, but a few “gotchas” you need to know to keep things running smoothly.

On every laptop I’ve had, I’ve had btrfs fail on me. Repeatedly.

So I gave up on it. ZFS for me these days.

> btrfs loves you or hates you

This is how superstitious traditions start, and ritualistic sacrifice in particular, I'd think.

>and ritualistic sacrifice in particular, I'd think.

Does data loss count as a sacrifice in this instance?

If it does, I think the ZFS "rebuild the pool from scratch" should as well, since that seems far more ritualistic.
>but there seems to be an effect where btrfs loves you or hates you

Surely it depends on the btrfs implementation. e.g. Arch Linux getting daily kernel updates vs an enterprise distro

Just as unstable on Arch as of a month or two ago.
Same and same. Never saw any problems with btrfs. Really like the memory consumption of btrfs!
> an effect where btrfs loves you or hates you

Same thing happens with operating systems.

One anecdote of a filesystem working fine and one anecdote of it becoming a disaster don't cancel each other out.

I wouldn't buy a $5 USB thumb drive if half the people said it lost their data and half said it worked fine.

I'd buy it -but only for short-term use to sneakernet shit I already had backed up reliably somewhere else.

of course, where we run into problems is that btrfs is meant to be the reliable backup. Oops.

You realize that there are $5 thumb drives that work, just like there are filesystems that actually work right? There isn't any benefit to using something broken, these problems have been solved.
Sorry, but this is an anecdata.

Down there, 2/3 of this hackersnews discussion (if you are patient to get there) you can see questions about production deployment of btrfs, with some VERY interesting answers of BIG deployments of btrfs. Read success confirmed with data. My takeaway from reading whole discussion:

* lot of people (individuals) praise of btrfs

* lot of people (ind.) tell about problems

* quite nice features/btrfs usage patterns, not matched even by zfs mentioned

* still for VM/DB you shall consider different approach (thin LVM + xfs or ext4) and slave machine WITH btrfs and snapshots on it

* quite many problems/deficienses of ZFS mentioned (apart fomr typical license/kernel inclusion)

* lot of new features on the way in recent kernels for btrfs

* btrfs is not dead

p.s. worth to comment that kernel 5.6 just received another huge new features batch for btrfs (async discard!)

ZFS is the only FS I used that resulted in complete FS corruption, losing nearly all data on disk (only once though).
Legitimately curious what the ZFS bug was. I’ve not heard of a TFDL bug in zfs for a Loooong time.

The reason Synology btrfs is mostly solid is because they refused to ever use the btrfs raid layer. But the second you move to btrfs on LVM you lose a large portion of the supposed benefits.

Having used both, never lost data on zfs and I’ve been using it since it was released and have had it save me from silent data corruption. BTRFS hasn’t ever lost me an entire file system, but I’ve definitely lost files.

I really don't understand the insane hype around ZFS. You can't read any thread that touches on filesystems without the ZFS zealots coming out.
ZFS is mature/stable, its feature set is basically unmatched (data checksums, compression, atomic snapshots, RAID(0,1,10,5,6), send/receive) by any other option on Linux, and what competition it does have is unstable in some configurations (BTRFS), essentially dead in the water (reiserfs), in early development (bcachefs), or far more complex to manage (gluster, ceph, LVM+XFS). Other than the licensing issue, ZFS is basically a silver bullet.
I agree and I'd like to add to the list of feature-set the adaptive cache (which does not only take into account the last time a block was used but as well how frequently it was used) and the SSD-cache ("ARC" respectively "L2ARC" in ZFS jargon).
Also don't underrate good documentation and easy to use tooling.
This was the killer feature for me.

I had been wanting to try ZFS on my home NAS for a while (for snapshotting/redundancy/data integrity) and finally got enough disks that it made sense. I wasn't looking forward to learning what I presumed to be a very complicated system though. About 15 minutes into my research for setting up and maintaining a ZFS filesystem and I just went - wait thats it? So incredibly simple and well documented, it has been a joy to use. It is very rarely that complicated operations on complicated systems use such simple and easy to understand commands. It just does what I expect!

ZFS is incredibly easy to learn to use, whereas btrfs is quite complicated to learn/use, and even more so if you've used ZFS since a lot of things are either just different enough to be weird, or so different that it makes no sense.

Examples: ZFS snapshots can be recursive (-r) or not, whereas on btrfs they cannot be recursive; in discussions I've seen, this is mentioned as "a feature", since you can create a subvolume for data that you don't want to be part of the snapshot, but it also prevents you from dividing up a logical heirarchy into multiple behaviours (compression vs. not, block size, etc.).

> but it also prevents you from dividing up a logical heirarchy into multiple behaviours (compression vs. not, block size, etc.).

Bind mounts can get around most of the limitations here, at the cost of polluting one directory with the canonical locations of all your special-purpose subvolumes. I think it's still awkward to simultaneously snapshot every subvolume that is mounted under a particular tree for incremental backup purposes.

The hype is quite easy to understand. Snapshots and checksums are two complete game-changers. ZFS has them both. And there are no real alternatives in many cases.

I've personally waited for BTRFS longer than a decade but my use-cases are yet to be considered stable (not something you really mess with in regard to filesystems).

Honestly, as sure as I have been on the success of BTRFS I now consider BTRFS dead on arrival - if it will ever even arrive. The pace of development is slower than the universe around it, that might be too harsh but really - no RAID6 yet? A decade ago the impression I got was "soon". And now 2-drive parity is becoming obsolete.

ZFS has tons of warts for home-use, I agree. So, for a home-user with high demands I don't see anything exciting in the future.

There were a bunch of btrfs raid56 patches last year. I think the known bugs have been addressed and is just that the wiki page hasn't been updated.

Re obsolete, are you referring to RAID1C3?

I'm thinking of this:

https://www.zdnet.com/article/why-raid-5-stops-working-in-20...

I'd much prefer something like raidz3 compared to the authors setup.

RAID1C3 is nice but very expensive for use in bulk storage at home.

What warts do you speak of?
No defragmentation, and as far as I'm aware all copy-on-write filesystems suffer greatly from fragmentation once utilization goes too high. ZFS will never recover unless you restart from scratch.

No way to rebalance a pool. Also increasing a pool always results in less reliability (in terms of drive losses that results in the whole pool going down).

No proper recovery tools if something goes wrong.

Then the lack of flexibility talked about in the article. This means the up-front cost and total cost is vastly more than a more typical setup where you can buy drives spread out over many years and take advantage of falling prices, less power consumption and noise (in part because you typically start such an array with higher density drives, since the low cost and longevity allows you to).

Probably forgot some other reasons.

That said I still use zfs (freenas) at home. But because of the above it is quite hard to blindly recommend it.

lvm and hence ext4 etc have had snapshots for ages.
As do NTFS. But they are not really comparable to "real" filesystem snapshotting, at least not in my opinion.
I don't think I am a zealot, nor a heavy user, but I use it on 1 machine at home (an NFS server running FreeBSD, which I have clients for elsewhere in my house). I came to this idea when I saw some data loss on some magnetic disks in my house, and repairing or even assessing the level of damage was difficult.

My experience is that it's pretty good. The tooling does what it says without a lot of drama. I can scrub while the system is in use and don't notice it mostly. I have seen some small corruptions that it was able to flag for me with specific filenames and fix. Snapshotting and send/receive is also very handy.

I heard some people say they don't like to use it under heavy load. That seems reasonable to me. You're paying costs to get the integrity piece. So it's not for every use or every user. It is very good at what it does, however.

Same with me. I just figured out at some point, 10 years ago, that it is nice to have snapshots on root disk. And figured out FreeBSD is supporting ZFS. Tryed it, loved it, used it. The ZFS on linux was destabilized in latest versions (`ls /.zfs/snapshots`) and they blew it considerably by adding it to systemd (I need to reboot fedora multiple times before it boots ever since), but at least I know that my data are not lost (unlike btrfs, had two major crashes in two years). Quite frankly I'll rather wait for Raisser to get out of jail than use btrfs again. Anyway, I bet on Hammer2.
ZFS is like really good snow tires in the winter. You can tell people with other tires how great it is to have really good tires, but they dont believe you until they experience the benefits for themselves. Or put another way, no one "needs" ZFS until they really need it, then they wont live without it ever again.

I switched to it after the 7200.11 firmware mess, where the drives reported successful writes but didnt write anything. ZFS would have caught that, my Adaptec card certainly couldnt have and didnt.

ZFS to the rescue again a while later when those (now firmware updated) 7200.11 drives started dropping after 15k hours of service. ZFS saved my data when two drives started failing in my RAID5 set at the same time.

All the weird minor problems that would cause random issues or performance issues for other file systems like flaky SATA cables, intermittent HBA/backplane ports, etc. ZFS catches them all and informs you.

Having been hit by bit rot, corrupted files, corrupted file systems, etc etc before switching, ZFS is fantastic. And there is something great about watching it scrub at >1GB/sec, verifying every single bit of your data.

Eh, i'm waiting for them to rewrite it in Rust.
Poe's law
This is such a meta-comment that I actually LOLd!
I had to check the dictionary for the meaning of "hype"

a situation in which something is advertised and discussed in newspapers, on television, etc. a lot in order to attract everyone's interest:

May be its just me because Morden day usage of "hype" seems to involve and implies a negative meaning, especially in tech. Similar to false advertising. And no one was actively promoting ZFS, they were only very "responsive".

And then zealots, I had to reread 226 comments, ran to Cambridge dictionary

a person who has very strong opinions about something, and tries to make other people have them too

I dont see anyone having strong opinions and force others to have the same. If anything a lot of people are showing not because the love ZFS, but they have been burnt by btrfs.

ZFS is the worst filesystem/volume manager, except all others.
Have you tried it? I went from having never used ZFS to loving it (and I guess being one of those zealots) very quickly after setting it up. So simple yet so powerful!
>You can't read any thread that touches on filesystems without the ZFS zealots coming out.

Agreed 100%. That's particularly annoying to us desktop users. It took me years to figure out that no, FreeBSD aside, it doesn't bring anything to the table outside of enterprise storage use cases. At least it doesn't bring anything that's worth the hassles (I don't have to export ntfs filesystems before using them on another computer; same for ext4 -and then there's performance).

I have a synology NAS on btrfs. One of the best computer purchases I've ever made.
I’ll second this, it’s fantastic. The time it takes to expand when adding a second 16TB is deeply average (8 days) but that’s about it for downsides. It’s the best computer I’ve owned.
Hard to standardize on something that can't be maintained in the same place all your other filesystems are in (in the Kernel) for licensing reasons.
Only the boot file system drivers need to be in the kernel. As long as there is a stable ABI, it's fine for everything else to be someplace else.
> As long as there is a stable ABI, it's fine for everything else to be someplace else.

Mainline Linux has a policy against in-kernel ABI stability guarantees. User-space is given ABI stability guarantees, in-kernel code by intention is not. That includes filesystems.

Do you have a link to the bug issue? ZFS purpotedly never had any corruption issues on release versions, so that makes it a really interesting case.