Hacker News new | ask | show | jobs
by pmarreck 2341 days ago
I have to beg to differ here as I had a different experience that I literally just posted about to Reddit yesterday

https://www.reddit.com/r/zfs/comments/eu1qsj/a_tale_of_two_f...

tl;dr Unbeknownst to me I had a bad drive cable for an external NVMe enclosure that was causing intermittent I/O errors (only during high drive utilization) that went undetected by BTRFS and slowly corrupted my drive, eventually leading to an unbootable and unrepairable system (and to be fair, I should have scrubbed instead of attempting btrfsck --repair from another booted drive, but I don't care what you say, a --repair function should NOT potentially cause FURTHER corruption if it is at all available in the tooling! Like, just fucking rip it out if it can potentially make things worse, or recode the damn thing to just act defensively... jeez)

Wiped the drive and started over with Ubuntu 19.10 and its new integrated ZFS on Root support... ZFS detected the IO issue pretty much instantly and prevented further errors by freezing I/O. Swapped the cable out during my troubleshooting and the issue went away. Also, drive is plenty fast, read test at 800MB/s

1 comments

I'll throw in my own anecdote. ZFS on root caused me a significant amount of headache when the proxmox node I was using it on just randomly decided it wasn't going to boot anymore. The ZFS pools were fine, no data was lost, but no amount of messing with it fixed the zfsonroot and it was quite difficult to find quality search results for.

And of course it was a weekend where my parents and siblings and in-laws were visiting, so I had the joy of going around messing with DNS settings wherever someone had a device that only paid attention to the first two DNS servers in the DHCP settings.

(I've since changed my DNS setup- now I only have a primary self-hosted one that's on an RPi in my networking cabinet, and the second entry is Google. I figure if I only get two servers that are respected for real, I'm making sure one of them is google.)

> I only have a primary self-hosted one that's on an RPi in my networking cabinet, and the second entry is Google.

I was under the impression that there was no such thing as primary and secondary for DNS, just ‘here is one’ and ‘here is another’, with someone going for a terrible naming system of ‘primary’ and secondary’. I’m no expert and my knowledge come from messing about with Pihole and reading their documentation.

The first nameserver listed in resolv.conf is kind of a primary as it will always be consulted first, unless you add "options rotate". The next nameserver only come into play if the first doesn't respond (default 5 seconds, also tunable with options). They're not named primary/secondary in the file but could be considered that way.
Don't rely on this behaviour, many DNS libraries, will query all or n to save on latency.
I suspect that both BTFS and ZFS are currently good enough under most configurations that most users don't have a problem with whichever they choose, and it's only a tiny fraction that has a really good or bad experience and becomes a rabid advocate based on their anecdotes.
This is an obvious truism. Of course they appear to work correctly under ideal conditions.

The real question is how they behave under less than ideal conditions. It is these conditions where Btrfs has performed poorly, and where ZFS has performed very well. I lost several Btrfs filesystems due to its poorly-tested and broken error handling trashing the filesystem beyond recovery.

The selling point of both of these filesystems is their robustness, fault-tolerance and ability to self-heal. Only one of them actually delivers.

That's really a packaging issue, not a ZFS issue, but I feel your pain.

The best suggestion I can offer is to use a distribution that treats it like a first-class citizen, such as... well, the Ubuntu support is still beta level, so only NixOS for now.

> when the proxmox node I was using it on just randomly decided it wasn't going to boot anymore

could this possibly be proxmox's fault more than ZFS's fault? You even said the pools were fine

That's why FS integration into the kernel would have been so important for the whole software ecosystem.