|
|
|
|
|
by danking00
1131 days ago
|
|
This episode was fascinating. I had heard of LZ4 but not Zstd. It spurred me to make changes to our system at work that are reducing file sizes by as much as 25%. It’s great to have a podcast in which I learn practical stuff! |
|
So for example take logfiles. You can train up a dictionary on some sample log data. Then you can compress individual log rows, and all it actually stores is a diff of the compression dictionary (if any new entries were added) and the compressed data. So you get very efficient compression of small amounts of data which are part of a collection that may be very self-similar, but with the option of decompressing any individual element at will. (Of course, you'd need to hold onto the original trained dictionary for both compression and decompression, for any row you want to be able to decompress in the future. And you might want to retrain the dictionary every so often for slowly-changing types of data, which might prevent "drift" of the efficiency towards less-efficient over time)
I believe Postgres already uses this under the hood for some columnar data. It wouldn't take much to index it before compressing it and just decompress it at will. Or maybe it just got added? https://devm.io/databases/postgresql-release