Hacker News new | ask | show | jobs
by yjftsjthsd-h 1888 days ago
(N=1) I have a single-disk laptop running opensuse (tumbleweed) on btrfs. It's the only machine I've ever owned to corrupt its root filesystem beyond repair, and it's done so twice IIRC (definitely 2, maybe 3), within the last few years. It's not just the initial issues.
4 comments

(N=1) I've had ZFS on an opensolaris system and it got corrupted, and since the ZFS engineers think they are gods who don't make mistakes there was no fsck that would even attempt to repair it. It was a perfectly repairable corruption which I fixed myself with a bit googling and dd to copy some bytes from one location on the disk to another (ZFS apparently keeps multiple copies of some the important data structures that describe the pool, one at the beginning of the block device and one towards the end, kindof as a backup I guess). For btrfs you at least have a decent working fsck, if shit hits the fan. ZFS is like, fuck you we won't even try.
I want to like ZFS, and am using it, but however good it is at not losing data while in operation, the UI feels like it’s designed to make you wreck your data. I guess I just need more practice, but not being able to just rip a drive out and mount it on another machine in a pinch makes me damn nervous. Something about how it’s managed makes the whole file system feel ephemeral, just one bad-but-not-obviously-so command away from being destroyed, and I’m nowhere near being comfortable with that yet (and don’t really see a path to getting to comfort)
> but not being able to just rip a drive out and mount it on another machine in a pinch makes me damn nervous

Why can't you? Granted, you need enough disks to actually have all the data - so ex. if you did RAID0 then yes you need all disks, but say if you did a mirror you can totally just yank a disk out, attach it to another machine, and `zpool import` it.

Can you? I was under the impression that without an “export” beforehand, you can’t.
See the "split" command with OpenZFS 0.8.0+:

* https://utcc.utoronto.ca/~cks/space/blog/linux/ZFSSplitPoolE...

Only with mirrored drives.

Any RAID-Z level would need a full export/import as data is striped, but hot-swap drives can be pulled once things are unmount.

I recently did just this. I had to use the -f flag but it imported just fine on a different computer.

I agree that it can be a bit daunting to operate, there are a few footguns around that, while it might not lead to data loss, but can lead to unfortunate situations.

Just the other day someone on the mailing list had managed to add a single drive as a new top-level vdev to a petabyte pool, rather than adding it as a new spare drive, simply by omitting the word "spare" from the "zpool add" command...

That said, I've been using ZFS at home here with 6+ disks for almost a decade now, and I've never lost data despite lots of various incidents, including lots of power losses and various hardware failures (like disks, mobo and PSU). So overall I'm very happy with it.

I guess it just seems like there’s a lot more state than with file systems I’m used to dealing with. In fact, since journaling became normal, most just have two states (from the user’s perspective) whether powered on or off, mounted or unmounted, whatever—broken, or OK.

ZFS has... a lot more. It’s just very different and the way these states fit together, and worrying about how to operate on them safely, makes me more nervous, in many ways, than less-safe file systems do. I’m sure that will pass, but it’s still not fun.

Oh no that'd be a terrible design:) AFAIK the most difficulty is that you might have to use `zpool import -f` to force it to ignore the pool not having been cleanly exported.

EDIT: It'd look like this: https://serverfault.com/questions/964075/how-can-i-recover-m...

I've had btrfsck segfault on my a couple times
scrub didn't help? I thought scrub was like fsck for ZFS.
> For btrfs you at least have a decent working fsck

When I say "corrupted beyond repair", I mean "the btrfs tools were not actually helpful".

Another (n=1) anecdote: similarly for me, ~4 years ago my raid-1 workstation OS drive which was, at the time, using btrfs nuked itself without warning or repair. Trying to recover any data was likewise an exercise in rapidly learning about FS internals.

I use zfs on everything now. I am sure at some point it will die horribly, but for now I haven't had a single problem in ~60 managed drives across 3 machines.

The other way to interpret this would be "it was the only filesystem to detect corruption on my malfunctioning hardware", because that's what usually happens in the last few years.

Or have you been using ZFS on the same hardware?

I haven't used ZFS on the same hardware, but during the time when BTRFS ate itself multiple times my home partition on the same drive also on BTRFS was perfectly fine, so it'd have to be an awfully specific hardware failure. Also, it would have had to hit the metadata both times since we're talking "pool wouldn't import" not "it gave me data checksum errors". Which again, is possible, but on an SSD with wear-leveling seems a tad unlikely.
He said "corrupt its root filesystem beyond repair", not "detect checksum errors"

Btw raid5/6 is still broken on btrfs which makes it a hard sell for any system with more than 2 disks. cf. raidz on ZFS

So? Do you think filesystem metadata is stored in a magical pixie cloud, or on the same unreliable physical hardware where it can easily get corrupted, especially after a crash or an unexpected power loss?

I posted this link here already:

https://www.usenix.org/conference/atc19/presentation/jaffer

f2fs (at least in its state a couple of years ago) is/was a prime example of how a filesystem can get into a barely working state with massive amounts of data and metadata corruption, and not even notice it.

God I love this site. In case of a minor disagreement with someone don't even bother to think, just press "downvote".

Corrupt data should be corrected by checksummed btrfs, isn't it?
Only if you have more than one copy of the data
RAID 5 has been generally advised against for years, due to performance issues and the effect of unrecoverable errors during rebuilds.

Btrfs RAID1 works perfectly, and RAID1c3/RAID1c4 provides additional redundancy. In place of RAID5, use RAID10 instead.

raidz2 is only advised against if your arrays are so small that you don't care about the price of storage.

If you want more IOPS, add more raidz2 (raid6) stripes to the pool. In practice, spinning rust is the new tape. Trying to do random access under 1MB is just silly

I don't stress over rebuilds. 2 more disks failing during a rebuild is incredibly unlikely compared to everything else that might force me to restore a backup (software bugs, data center flooding, etc).

I'm running opensuse tumbleweed past couple years on my laptop & desktop, with btrfs as root FS. No issues. I also run btrfsmaintenance script every month, maybe that helps.