Hacker News new | ask | show | jobs
by ailideex 2341 days ago
What about the reliability? Are many people losing data with Btrfs?
5 comments

Caveat: I don't use RAID[0].

In about 4 years of running it on a couple of servers and countless virtuals/desktops, I've never had a reliability issue that was directly related to btrfs. I do not have my servers plugged in to UPSes, I have the occasional "shutdown due to power loss". The only time I've lost data has been due to cable disconnection in my hardware RAID array, and even then I was able to recover a substantial amount of its `btrfs` stored files.

[0] Well, not filesystem-provided RAID; I have LSI controllers that provide the array to the OS as a single disk.

As best I can tell, reports of data loss on btrfs are all from the early 20-teens; after about 2014 or so I can't find anyone who claims to have lost data due to a btrfs bug on an up-to-date system.
RAID5 on btrfs has a write hole last time I checked. Bug has been around forever, and was around in 2014 for sure.

Phoronix has some thorough performance comparisons between Ext4fs, Btrfs, XFS, and ZFS.

The write-hole problem is a rare case, wherever it happens. https://lwn.net/Articles/665299/

On Btrfs, in case of bad parity being used to reconstruct a stripe, the resulting bad reconstruction is still subject to data checksumming, and will EIO. Corrupt data won't be sent to user space.

That one is here to stay, it is a property of software-based RAID. If it bothers you, use UPS.
ZFS-based RAID-5 (called raidz1) doesn't have write hole.

https://blogs.oracle.com/ahl/what-is-raid-z

Because ZFS raidz1 is not raid5, it's even labelled differently. Yes, it is a parity-based raid, but has slightly different semantics.
I think in Linux, if you're using mdadm there is the ability to specify a write journal; all data (i.e. blocks+parity) gets written to the journal first, and then gets cleaned up after everything gets completed successfully, and the journal is replayed after a power failure.

Mind you, for that to work well you'd want a victim SSD with a write speed at least that of the array...

Hardware RAID can also suffer from this indeed but does ZFS suffer from it as well? With exactly the same impact? AFAIK the filesystem stays consistent on ZFS.
raidz1 is not raid5.

From https://pthree.org/2012/12/05/zfs-administration-part-ii-rai... :

> ather than the stripe width be statically set at creation, the stripe width is dynamic. Every block transactionally flushed to disk is its own stripe width. Every RAIDZ write is a full stripe write. Further, the parity bit is flushed with the stripe simultaneously, completely eliminating the RAID-5 write hole. So, in the event of a power failure, you either have the latest flush of data, or you don't. But, your disks will not be inconsistent.

> There's a catch however. With standardized parity-based RAID, the logic is as simple as "every disk XORs to zero". With dynamic variable stripe width, such as RAIDZ, this doesn't work. Instead, we must pull up the ZFS metadata to determine RAIDZ geometry on every read. If you're paying attention, you'll notice the impossibility of such if the filesystem and the RAID are separate products; your RAID card knows nothing of your filesystem, and vice-versa. This is what makes ZFS win.

Raidz1 isn't raid5, but if it mostly solves the same problem for users, without running into the write hole issue, isn't that suggesting we use raidz1 on zfs instead of raid5 on btrfs if we're concerned about unclean shutdowns?

What would we be missing in terms of capabilities by having raidz1 instead of raid5? (Just from the redundancy and performance point of view; let's assume everything else on btrfs and zfs is equal)

That is just nonsense. Btrfs had well known and well published issues for years. And since when are the data of 20 somethings not important?
"Early 20-teens" means the years 2010-2014 or so in this case. Nothing to do with people or their age.
It's the default for new Synology devices, and has been for a while. I suspect others are using it in a similar situation for home-grade NAS and up into the prosumer end of the market.

I feel like Btrfs is probably going to be well tested here, but I wonder how many of these users are diagnosing Btrfs problems when they occur? It's going to be more evident to some people, and you have to assume that some of the vendors are competent, but this is against a backdrop of people throwing this kit away or starting from scratch versus performing a root cause analysis.

I've personally been running this since it was stable on my DS1515+. I haven't had filesystem issues yet, but I make sure my important stuff is backed up elsewhere. A local backup like this is convenient for faster recovery in a lot of situations though which is why I keep it. I've SSH'd to the device and played around a little, but I fear I'd hit something proprietary, if the worst recovery situation occurred and I had to get everything from the DS1515+. If it was just an Ubuntu box I wouldn't have those fears, but the Syno NAS package is compelling.

My understanding is most bugs are ironed out of btrfs itself, but tooling is still weak. For example, if you have a disk drive go bad on you and you manage to recover ~ half of the sectors with a disk imaging tool, you won't be able to extract files from the image without extreme effort.
Why hasn't this caught up? Is it the case that data recovery companies are hoarding this after investing in their own tools, or something fundamental to the community?
Reliability does not only mean data loss. It may not be losing data but crashing every few hours, or locking up the system, or requiring constant monitoring and maintenance etc.