Hacker News new | ask | show | jobs
by densone 1609 days ago
First off FreeBSD FTW. I use it everywhere over Linux now for the first time in 25 years and couldn’t be happier. My only wish is that BSD had a better non-CoW file system. Databases and Blockchains are already CoW so it does irk me slightly to use zfs for them. That being said, I’ve never had a problem because of it.
4 comments

That's one of the fields FreeBSD is bad at: it's not really possible to get info on the current "normal" file system, UFS2.

This latest version has something called "journaled soft updates" and it's a metadata-journaled system, i.e. the actual data is passed through, and it's non-CoW.

If your complaint about UFS is the lack of journalling, you might be interested in https://docs.freebsd.org/en/articles/gjournal-desktop/
Do not use gjournal though, use the more recent SUJ. (I believe it’s enabled by default those days.)
My issue is performance. But I’ve only read about UFS performance. So it might be fine?
I don't think there's much (anything?) in UFS that would lead to poor performance other than the usual suspects:

If your disk is slow or dieing, you might blame UFS, but it's not really UFS.

I've had some vague issues with the I/O scheduler, which isn't really UFS, but at the same time, UFS may be the only real client of the I/O scheduler, I think ZFS does it's own thing, anyway the systems were UFS only. This is super vague, and I don't have more details, but I just want to put it out there. For one class of machines that had a lot of disks (about 12 ssds), did a pretty even mix of reads and writes, evenly spread across the disks, upgrading from FreeBSD X to X + 1 wasn't possible because there was a large performance reversion. I think this was 10 -> 11, but it's possible it was 11 -> 12. Because this came up while my work was in progress migrating to our acquirer's datacenter which included switching to their inhouse Linux distro, it made sense to just leave those hosts on the older OS, and not spend the time debugging this. We didn't have a way to test this without production load, but that had user impact, and it would take a while to show up. It's quite possible this was just a simple tuning error, or possibly a bug that has been fixed for some time; the symptoms were obvious though: processes waiting on io, but the disks had a lot of idle time.

If you have a lot of files in a given directory, that's kind of slow, and IIRC, if the directory every had a lot of files, the extra space won't get reclaimed until the directory is deleted, even if most of the files are unlinked. (This isn't uncommon for filesystems, some filesystems handle it better than others, there are application level strategies to deal with hashing/deeper directory trees)

If the filesystem ever gets too full, the strategy to search for free space changes to one that's less fast; it won't change back, so don't fill your disk too much. (This ends up being a good idea for SSD disk health too, and again isn't super unusual in filesystems, but some filesystems probably do better). tunefs(8) says:

> The file system's ability to avoid fragmentation will be reduced when the total free space, including the reserve, drops below 15%. As free space approaches zero, throughput can degrade by up to a factor of three over the performance obtained at a 10% threshold.

UFS has snapshots, which is great, but everyonce in a while, you end up with a snapshot you forgot about, and it can really eat disk space and you may miss it. Not really a performance issue, but can lead to overfilling your drive.

Of course, there's the obvious that UFS has no support for checksumming, but that's not performamce. Soft updates do allow for some amount of consistency in meta data, and background fsck is nice (but could tank performance, I suppose).

ZFS performance on raids of NVMe is quite bad. If you need performance, use xfs over mdadm.
How, out of curiosity?

I haven't made much use of them but the mirrors or raidzs seemed to perform more or less inline with expectations (consumer hardware may not have the PCIE lanes really available to run multiple fast NVME devices well).

> How, out of curiosity?

Compare with XFS over a mdadm say in raid10 3 legs f3, then cry.

> consumer hardware may not have the PCIE lanes really available to run multiple fast NVME devices well

Trust me, I have all the lanes I need, even if I would always wish I had more :)

this highly depends on how ZFS is configured.

for instance, what is your ARC configuration in this case? It can have a massive impact of performance.

getting ZFS to perform well takes a bit of work, but in my opinion performance is on par with most filesystems. (and it has a ton of additional features).

No, it doesn't, there's a hard cap. I spent a long time trying to replicate the performance I was accustomed to in XFS.

L2ARC can improves cached reads, but it's not magical, especially not for random reads... or writes. (and yes, I know about SLOG, but doing async is faster than improving sync)

And don't get me started on how ZFS is not using mirrors to improve read speed (unlike mdadm can do, cf the difference between o3 n3 f3) or how it can't take advantage of mixed arrays (ex: a fast NVME + a regular SSD or HDD to add redundancy: all the reads should go to the NVME! The writes should go async to the slow media!)

If you don't have a RAID of fast NVMe that are each given all the lanes they need, you may not see a difference.

But if you are running baremetal close to 100% of what your hardware allows, and the choice of everything you want to buy and deploy, you'll see these limits very soon.

In the end, I still chose ZFS most of the time, but there are some usecases where I think XFS over mdadm is still the best choices.

> databases

direct i/o?