Hacker News new | ask | show | jobs
by pojntfx 2341 days ago
Love using Btrfs; the is no better filesystem than it nowadays that it's reliability issues have been fixed.
5 comments

I have to beg to differ here as I had a different experience that I literally just posted about to Reddit yesterday

https://www.reddit.com/r/zfs/comments/eu1qsj/a_tale_of_two_f...

tl;dr Unbeknownst to me I had a bad drive cable for an external NVMe enclosure that was causing intermittent I/O errors (only during high drive utilization) that went undetected by BTRFS and slowly corrupted my drive, eventually leading to an unbootable and unrepairable system (and to be fair, I should have scrubbed instead of attempting btrfsck --repair from another booted drive, but I don't care what you say, a --repair function should NOT potentially cause FURTHER corruption if it is at all available in the tooling! Like, just fucking rip it out if it can potentially make things worse, or recode the damn thing to just act defensively... jeez)

Wiped the drive and started over with Ubuntu 19.10 and its new integrated ZFS on Root support... ZFS detected the IO issue pretty much instantly and prevented further errors by freezing I/O. Swapped the cable out during my troubleshooting and the issue went away. Also, drive is plenty fast, read test at 800MB/s

I'll throw in my own anecdote. ZFS on root caused me a significant amount of headache when the proxmox node I was using it on just randomly decided it wasn't going to boot anymore. The ZFS pools were fine, no data was lost, but no amount of messing with it fixed the zfsonroot and it was quite difficult to find quality search results for.

And of course it was a weekend where my parents and siblings and in-laws were visiting, so I had the joy of going around messing with DNS settings wherever someone had a device that only paid attention to the first two DNS servers in the DHCP settings.

(I've since changed my DNS setup- now I only have a primary self-hosted one that's on an RPi in my networking cabinet, and the second entry is Google. I figure if I only get two servers that are respected for real, I'm making sure one of them is google.)

> I only have a primary self-hosted one that's on an RPi in my networking cabinet, and the second entry is Google.

I was under the impression that there was no such thing as primary and secondary for DNS, just ‘here is one’ and ‘here is another’, with someone going for a terrible naming system of ‘primary’ and secondary’. I’m no expert and my knowledge come from messing about with Pihole and reading their documentation.

The first nameserver listed in resolv.conf is kind of a primary as it will always be consulted first, unless you add "options rotate". The next nameserver only come into play if the first doesn't respond (default 5 seconds, also tunable with options). They're not named primary/secondary in the file but could be considered that way.
Don't rely on this behaviour, many DNS libraries, will query all or n to save on latency.
I suspect that both BTFS and ZFS are currently good enough under most configurations that most users don't have a problem with whichever they choose, and it's only a tiny fraction that has a really good or bad experience and becomes a rabid advocate based on their anecdotes.
This is an obvious truism. Of course they appear to work correctly under ideal conditions.

The real question is how they behave under less than ideal conditions. It is these conditions where Btrfs has performed poorly, and where ZFS has performed very well. I lost several Btrfs filesystems due to its poorly-tested and broken error handling trashing the filesystem beyond recovery.

The selling point of both of these filesystems is their robustness, fault-tolerance and ability to self-heal. Only one of them actually delivers.

That's really a packaging issue, not a ZFS issue, but I feel your pain.

The best suggestion I can offer is to use a distribution that treats it like a first-class citizen, such as... well, the Ubuntu support is still beta level, so only NixOS for now.

> when the proxmox node I was using it on just randomly decided it wasn't going to boot anymore

could this possibly be proxmox's fault more than ZFS's fault? You even said the pools were fine

That's why FS integration into the kernel would have been so important for the whole software ecosystem.
I tend to agree with you here -- reliability has been a non-issue for me, though I've never configured `btrfs` in its RAID configuration.

Performance becomes an issue in certain cases, but in every one that I've encountered, adjusting configuration has resolved the problems to my satisfaction.

Would my Windows 10 VM run better under a different filesystem, rather than `btrfs` with various tweaks applied? Reading relatively recent articles on the subject would suggest that it would, however, I'd rather work with a single filesystem type and understand its strengths/weaknesses than manage two different filesystems as long as I can get performance to a usable state.

We have run btrfs in RAID configuration, but that has usability issues, even just doing RAID-1.

We've switched back to using MD (mdadm) for RAID-1 setup, and then using btrfs on top of that for the snapshots, send / receive, block-level CRC and such.

Dealing with failed drives isn't as easy with btrfs as it is with Linux MD.

Does that include performance reliability?

It wasn't very long ago that I had BTRFS drives on two separate systems develop crippling performance issues, with random delays increasing up to seconds, and the filesystem going unresponsive for even longer when I deleted snapshots. I think something about the performance was degrading every time an hourly snapshot was made, even though the system only kept a couple dozen of them at a time.

> nowadays that it's reliability issues have been fixed

Is this also true for RAID5/6?

This issue has its own wiki page on the BTRFS wiki:

https://btrfs.wiki.kernel.org/index.php/RAID56

So, no, that particular issue hasn't been fixed.

> * For data, it should be safe as long as a scrub is run immediately after any unclean shutdown.*

That’s unfortunate. Does the scrub run automatically in those situations? Consumer hardware will be the most prone to intermittent power failure.

There are some big caveats there. https://btrfs.wiki.kernel.org/index.php/RAID56
is it considered better than ZFS?