Hacker News new | ask | show | jobs
by zielmicha 2765 days ago
- fsync on btrfs is extremly slow. The situation has improved in last 3 years, but it is still much slower than ZFS. I just did a simple test and BTFS is 12x slower than ZFS (on my customer SSD, for small writes).

- RAID5/6 mode is only experimental in btrfs, while it is really stable in ZFS.

- I don't have concrete data for that, but in my experience, BTRFS has high latency (>1 second) even for small file operations when under load.

- While that should not be a problem for production systems, I have some crappy hardware where BTRFS oopses or corrupts data, while other filesystems (Ext4, ZFS) work fine.

2 comments

Yes the 12x slower seems about right. I remember Debian installations taking >1h while ext4 took < 10min. (Installations and package operations are particularly bad since dpkg does many fsync()s when unpacking). I think that was on a spinning drive, though. Anyway at my old job (where we had mainly spinning drives) we used apt and dpkg only with eatmydata. eatmydata is a command which uses LD_PRELOAD hackery to remove fsync() systemcalls.
I believe the correct solution to this problem would be that the installers would snapshot the system, install all packages without any fsync()ing at all, then finally one sync() and remove the snapshot. Optionally keeping snapshots if user want to roll back from a broken upgrade (for whatever reason). Again, as others have written here, btrfs is great if your software plays along with it, otherwise it might not be so great.
While this is a good idea in theory (and is possible on e.g. FreeBSD with ZFS with boot environments), dpkg can't do this easily. The main problem is that it's possible and supported for the managed files to be placed upon multiple filesystems. Separate /usr, separate /var, separate /usr/share, whatever combination you choose. This means that dpkg needs to force file synchronisation across all mounted filesystems and it can only do this robustly by issuing fsyncs.

When there's only a single filesystem, and that filesystem is btrfs (or ZFS), it should however be possible to optimise this away and delegate everything to the filesystem. But even here, maintainer scripts may issue their own fsyncs as they update their own databases, kernel images or whatever.

> dpkg needs to force file synchronisation across all mounted filesystems and it can only do this robustly by issuing fsyncs

Not if file-change notifications were supported robustly by dpkg and the kernel (to a lesser extent). Getting to that would, however, require massively restricting the compatible-kernel-versions set of dpkg, and would also probably require undoing some of the more . . . misguided pieces of history with regard to file-change notification systems in Linux.

I don't think this is correct. File-change notifications wouldn't provide any information which isn't already known. dpkg, after all, is entirely responsible for unpacking the .deb files and doing the file modifications. It's fully aware of what was written, in what order, and when.

The problem is that the system state needs checkpointing for every package state change. It must allow for recovery on failure, termination, abortion or power loss, amongst other scenarios. And the package database must remain in sync with the filesystem state.

When every managed file is on one snapshot-able filesystem, this could be rolled back atomically, and the fsyncs skipped. But as soon as you have a non-snapshot-able filesystem or multiple filesystems in use, the fsyncs can't be skipped.

This was really painful. Back when I was a Debian developer tracking unstable, I would update every evening and it would sometimes take multiple hours to complete. The fsync performance is a truly awful design flaw, and while eatmydata does make it faster, it's a terrible and dangerious hack! (Which everyone used!)

I even had to add eatmydata support to the schroot tool as a command-prefix option to allow every command to be run via eatmydata when using btrfs snapshots. (I've since dropped btrfs support entirely; it was too unreliable with large snapshot turnover rapidly unbalancing the filesystem. Unusable in production.)

fsync performance on ZFS on Linux isn't as performant as it should be either. On my Optane 900p I have almost 10x better fsync performance on FreeBSD compared to Linux.