"Btrfs has played a role in increasing efficiency and resource utilization in Facebook’s data centers in a number of different applications. Recently, Btrfs helped eliminate priority inversions caused by the journaling behavior of the previous filesystem, when used for I/O control with cgroup2 (described below). Btrfs is the only filesystem implementation that currently works with resource isolation, and it’s now deployed on millions of servers, driving significant efficiency gains."
Yeah there are a remarkable set of container runtime tasks (package downloads, rootfs creation and management, etc) that are way easier with btrfs. It wasn’t always smooth sailing but luckily Chris, Josef, Omar and others are awesome and now (and for the last while) we are asking for features rather than fixes.
At a previous job, I deployed btrfs to production in a system that continuously spins up and shuts down thousands of VMs. A key feature that I was able to leverage to make this easy is seed devices. This btrfs feature works similarly to overlay filesystems.
If I were doing that today, I would do a bake-off of OverlayFS vs. btrfs for this feature. Btrfs has many other compelling features that may make it worth using, although it's always been slower than ext4/xfs so I'd also need to check how it does with modern ultra high performance NVMe drives.
Btrfs never lost our data, although there was a kernel panic in the journal writing code in the Linux 3.2/Ubuntu 12.04 timeframe. The panic would not cause data loss but it did wedge VMs. Since that was fixed, it's had a 100% reliable run in that system, to my knowledge.
I heard people get stable btrfs when certain features are turned off, so it may be helpful to say what you have turned on or off with its features when saying how it has been stable or not.
(also a happy syno user here, been using it on several NAS's quite happily).
My rough understanding is synology did some pretty heavy modifications to btrfs in their implementation though... (a quick google finds me nothing to back this up, but i remember reading about it somewhere...)
Not modifications per se, but it doesn't quite do the "normal" setup. Encryption is a mess (you can't export encrypted volumes via NFS), and the caching layer on top of it seems prone to corruption on the SSD (I've had my NVMe mirror cache drop twice over the last year and a half).
I'd like to see them move to full disk encryption rather then their current approach.
They do encryption/compression on subvolume level; each share you create is a separate subvolume.
For RAID5, they are using it on top of LVM, but with some modification - the synology implementation hooks LVM and btrfs together, so it gets ZFS-like properties.
So they have fixed the last big hurdle to btrfs adoption in the small (single node) NAS space and are just sitting on it (violating the GPL). I urge any Synology user to write them to send you the Linux kernel source then upload it somewhere... though, their last Linux kernel drop seems to have been in 2017, so not much hope there...
We (the build2 project) use it in our CI infrastructure for VM storage. For every build we make a snapshot of a VM, boot it, build, drop the snapshot, repeat. So we are talking about making/dropping snapshots every couple of minutes 24x7 for months without a reboot. We haven't had a single issue.
When I was doing whole rebuilds of Debian, using e.g. 8 parallel builds of >18000 packages, it was creating and destroying a snapshot once every few seconds to minutes, at most 8 snapshots in existence at once. It got unbalanced and went write only every 36 hours. A clean brand new filesystem which never had more than 10% space utilisation and was typically around 1%.
At home: I'm running a RAID1 btrfs on my 12 disk cold storage (rackmount, SAS backplane, JBOD SAS controller). It has two new 4TB 24/7-rated SATA disks I got for that NAS, the rest is mostly salvaged from work (old drives, 500GB to 1TB). I had exactly the same selling point on btrfs as the author - I see a "huge" 7.8TB RAID1, and once it fills up I just swap an old disk (or two) for another 24/7 disk with decent TB/$.
At work: I was told our OpenSUSEs had some failures/data-loss, so we're not using the default btrfs on these. Though I don't know with what version that was (we migrated to OpenSUSE about 3 years ago).
"Btrfs has played a role in increasing efficiency and resource utilization in Facebook’s data centers in a number of different applications. Recently, Btrfs helped eliminate priority inversions caused by the journaling behavior of the previous filesystem, when used for I/O control with cgroup2 (described below). Btrfs is the only filesystem implementation that currently works with resource isolation, and it’s now deployed on millions of servers, driving significant efficiency gains."
https://engineering.fb.com/open-source/linux/