Hacker News new | ask | show | jobs
by deschutes 1135 days ago
You can't eliminate the risk of data loss, only control for it. fsync is one such control. Empirically, having separate power failure domains strongly controls for the power loss risk.

In the tail there are all kinds of things that will lose you data. I've actually seen systems lose data with the fsync every message strategy on simultaneous power loss. There was latent corruption of the filesystem due to a kernel bug. After power cycling a majority of nodes had unrecoverable filesystems.

In my experience, even on modern flash the cost of fsync is non trivial. It pessimizes io. You can try to account for this with group commit / batching but but generally the batch window needs to be large relative to network rtt to be effective.

fsync is much more necessary on single primary systems.

1 comments

I only remember losing one etcd cluster, and it was due to something along these lines. Data center at the customer site lost power, and we were called when they couldn't recover our software. All the etcd volumes were corrupted, and after volume recovery by the customer IT department, we found all our etcd files corrupted.

My best guess is their volume systems simply lied about the fsync, which I've heard of a few times about different vendors.