Hacker News new | ask | show | jobs
by greypowerOz 1652 days ago
>"Also, consider that if you can only read at 100 MB/s off a mechanical drive but your CPU can decompress data at ~500 MB/s then the mechanical drive is able to provide 5x the throughput you'd otherwise expect thanks to compression."

I'd not really thought of that aspect before... My old brain is hard-coded to save cpu cycles ... Time to change my ways :)

4 comments

This is an older chart of ZSTD But it is an inspiring visualisation - compression algorithm "times" transmission bandwidth => transmission speed.

http://fastcompression.blogspot.com/2015/01/zstd-stronger-co...

Taken from the fastcompression blog - where one could follow ZSTD's author since before ZSTD was even conceived.

"Conveniently" enough the author of the blog has written both ZSTD and LZ4, which top the chart for their respective link speed domains. (2015 data - things have improved in both ZSTD and others since then.)

This was already a common technique back in the 90's for loading asset data in games and even more important than today with fast SSDs. Instead of many small files, the asset data is loaded from a single big compressed archive file, this works around the terrible small-file-performance (mainly in Windows file systems), and it speeds up loading because loading the compressed data and decompressing is much faster than loading the decompressed data directly from the slow disc.
A variation on this also still is extremely common in the case where the ‘file system’ is a web server and the program accessing it a web browser (https://css-tricks.com/css-sprites/)

In that case, there typically isn’t additional explicit compression (1). The main gain is in decreasing the number of http requests.

(1) the image itself may have inherent compression, and that may be improved by combining images with similar content, and the web server may be configured to use compression, but the first typically isn’t a big win, and the second is independent from this strategy.

Everything old is new again, so maybe you're just in time. Consider we have NVMe drives which can read and write 4-5 GB/s[1], that whole equation changes again...

[1]: https://www.anandtech.com/bench/SSD21/3017

zstd at compression level 1 can do ~2GB/s per core, and as time goes on and processors get more and more cores, compressing data by default is a valid proposition.

In fact, if you install Fedora 35 on btrfs, zstd:1 is enabled by default, using fs-level heuristics to decide when and when not to compress, reducing write amplification on SSD drives and gaining some space for free with negligible performance impact, which is nice.

My 8GB ~/src directory on encrypted btrfs on NVMe uses 6GB on disk and I can easily saturate the link while reading from it. Computers are plenty fast.

> zstd at compression level 1 can do ~2GB/s per core

So unless you can multithread that workload, it's already behind by a factor of 2.

> Computers are plenty fast.

My point was you can no longer assume the disk is significantly slower, at least for streaming workloads. You can often still win by spending CPU cycles doing clever stuff, but it's not several orders of magnitude difference like it used to be.

Most zstd implementations are multithreaded for single compression and decompression tasks. My home server has 24 cores and is using 16x PCIe 3.0 for storage (16 GB/s). Through benchmarking I found that I am storage bound from zstd-3+ and compute bound from zstd-2 and zstd-1. I run zstd-3.
Dumb question, but shouldn't it be the opposite? The lower the compression level, the faster the CPU can process the data, making storage the bottleneck. Higher compression level -> CPU becomes the bottleneck.
I believe in the TrueNAS interface I use these are negative compression levels. I could be wrong. It was just some empirical observations. I may flesh out my matrix more to paint a better picture.
There exist network connections at 80Kb/s. Still data repositories.
This idea is very powerful. Creating small compressed batches of transactions also allows you to write to disk in terms stated as "Transactions per block IO". When dealing with storage media that can wear out at the block grains, this can save substantial device lifetime.