| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by 5e92cb50239222b 1578 days ago

> If their machines fail, they don’t care

This myth is being perpetuated despite btrfs devs (who work at facebook) stating the exact opposite many times over.

Every FS corruption and weird behavior is put aside and investigated. They very much do care.

https://lwn.net/ml/fedora-devel/03fbbb9a-7e74-fc49-c663-3272...

Please read the whole thread before repeating this nonsense, or at least every email sent there by Josef Bacik.

See also:

https://lwn.net/Articles/824855/

https://lwn.net/Articles/824620/

2 comments

ComputerGuru 1578 days ago

> Every FS corruption and weird behavior is put aside and investigated. They very much do care.

Just because you and I are using different meanings of the word "care" doesn't mean the point isn't valid. They "care" in that they would like to know what went wrong and study it further. They don't "care" in the sense that they suffered no real harm and no stakes were riding on any one particular server that failed. It's not just a matter of having a backup/redundancy, it's about having automated systems (or even just standard procedures that are being executed on a daily basis at that scale one way or the other) that take care of these failures. So even in production, "regular" btrfs users might have backups so "no lasting damage" would be incurred, but that's hardly the same as openly volunteering themselves for risk.

That's all besides the main point: Facebook is deploying "known good" configurations. They're using a very select subset of features. They're not trusting changed btrfs features/implementations being correct or, as was my experience, worrying about less-used/tested codepaths leading to data loss.

link

spookthesunset 1578 days ago

As a tl,dr:

“Also keep in mind we pay really close attention to burn rates for our drives, because obviously at our scale it translates to millions of dollars. Btrfs has improved our burn rates with the compression, as the write amplification goes drastically down, thus extending the life of the drives.”

As with anything it comes down to money. Yes a machine going down doesn’t impact the cluster but it does impact their wallet. Every failure of a disk costs money and on the scale of the big boys that can add up to big money.

So while “the system” doesn’t care about drive failures the accountants and CFO’s absolutely care.

link

ComputerGuru 1578 days ago

Just pointing out that "caring about physical drive failure" and "caring about disk corruption or data loss" are completely independent and the latter does not directly equate big money (as there are already systems and SOP in place to deal with handling failed servers). Btrfs isn't notorious for actually frying disks, just the data on them.

link

cestith 1578 days ago

Do they care about the FS just silently eating data? I ask because btrfs has been known to do that. Sure, you're not replacing the drive, but you're probably wiping the VM's disk image and creating a new one.

link