For (low) bandwidth metrics yes, for any kind of latency sensitive workload not really.
The extra decompression on top of the data fetch latency can be quite noticeable. Sometimes that can be offset if the compression ratio is affecting a hitrate, and thereby decreasing the latency. The problem of course is that even with 10M IOP storage devices frequently it is really latency and an inability to keep 100k requests outstanding that limit perf to one's IO turnaround latency.
Put another way, compressed ram and disk are really popular in systems which are RAM constrained, or bandwidth limited because the cost of fetching 2x the data vs 1x and decompressing it is a win (think phones with emmc). The problem is that this doesn't really make sense on high end NVMe (or for that matter desktops/servers with a lot of RAM) where the cost of fetching 8k vs 4k is very nearly identical because the entire cost is front loaded on the initial few bytes, and after than the transfer overhead is minimal. Its even hard to justify on reasonable HD/RAID systems too for bandwidth tests since any IO that requires a seek by all the disks will then tend to flood the interface. AKA it takes tens of ms for the first byte, but then the rest of it comes in at a few GB/sec and decompressing at faster rates takes more than a single core.
edit: And to add another dimension, if the workload is already CPU bound, then the additional CPU overhead of compress/decompress in the IO path will likely cause a noticeable hit too. I guess what a lot of people don't understand is that a lot of modern storage systems are already compressed at the "hardware" layer by FTL's/etc.
I'm curious. I use btrfs daily. Although I have been interested in using zfs, I haven't yet gotten the time. In your experience, is zfs faster than btrfs?
Yes. Much faster. Especially for HDDs. But at a cost of a lot of RAM.
Also lz4 compression can speed up your HDDs up to 10x (!) to read and 3x to write. [1, see "To compress, or not to compress, that is the question" section.] But it's going to have a considerably higher CPU usage as well.
I am not sure about cheap ARM devices but I am using an old Haswell i5-4670 and it is more than enough. So it won't be issue later.
Also, when you are talking about consumer NAS, the real problem is that any low-end systems can saturate the gigabit network (100MB/s) very easily so investing on extra resources for ZFS doesn't make difference. At least a 10Gbe network (which is beyond the average consumer) is required to actually make it useful.
I repurposed a micro Dell Optiplex 3060 with 8GB RAM and two external HDDs totalling 9TB of space. The CPU is an i3. The whole thing takes less space than a book.
I have lz4 enabled and the gigabit link is almost completely saturated when transferring: 119 MB/s out of the total theoretical 125.
No ZIL, no L2ARC devices are attached. That thing is _flying_ as a home NAS.
I've used both Btrfs and Zfs as Linux root filesystems and at the time I tested (about 4-5 years ago) Btrfs had much worse performance. I've heard that Btrfs greatly improved performance on recent kernels though.
What bothers me about Zfs is that it uses a different caching mechanism (ARC) than Linux page cache. With ARC you actually see the memory used in tools like htop and gnome system monitor (it is not cool seeing half your memory being used when no programs are running). ARC is supposed to release memory when needed (never tested though), so it might not be an issue.
After about an year of playing with both filesystems on my Linux laptop, I decided the checksumming is not worth the performance loss and switched back to ext4, which is significantly faster than both filesystems. Still use ZFS on backup drives for checksumming data at rest and easy incremental replication with `zfs send`.
My main problem with ZFS is the very limited number of ways you can change your setup. No removing drives, no shrinking, etc. Probably fine for (bare-metal) production systems, but not so friendly with desktops/laptops, where I would still love to have snapshots and send-recv support.
The extra decompression on top of the data fetch latency can be quite noticeable. Sometimes that can be offset if the compression ratio is affecting a hitrate, and thereby decreasing the latency. The problem of course is that even with 10M IOP storage devices frequently it is really latency and an inability to keep 100k requests outstanding that limit perf to one's IO turnaround latency.
Put another way, compressed ram and disk are really popular in systems which are RAM constrained, or bandwidth limited because the cost of fetching 2x the data vs 1x and decompressing it is a win (think phones with emmc). The problem is that this doesn't really make sense on high end NVMe (or for that matter desktops/servers with a lot of RAM) where the cost of fetching 8k vs 4k is very nearly identical because the entire cost is front loaded on the initial few bytes, and after than the transfer overhead is minimal. Its even hard to justify on reasonable HD/RAID systems too for bandwidth tests since any IO that requires a seek by all the disks will then tend to flood the interface. AKA it takes tens of ms for the first byte, but then the rest of it comes in at a few GB/sec and decompressing at faster rates takes more than a single core.
edit: And to add another dimension, if the workload is already CPU bound, then the additional CPU overhead of compress/decompress in the IO path will likely cause a noticeable hit too. I guess what a lot of people don't understand is that a lot of modern storage systems are already compressed at the "hardware" layer by FTL's/etc.