|
|
|
|
|
by pmarreck
1134 days ago
|
|
Probably the most underrated feature of zstd (likely because it's so unusual) is the ability to create a separate compression dictionary. This allows you to develop customized and highly efficient dictionaries that are highly specific to a type of data AND allow you to compress elements of that data without including an entire separate dictionary in every compression output. So for example take logfiles. You can train up a dictionary on some sample log data. Then you can compress individual log rows, and all it actually stores is a diff of the compression dictionary (if any new entries were added) and the compressed data. So you get very efficient compression of small amounts of data which are part of a collection that may be very self-similar, but with the option of decompressing any individual element at will. (Of course, you'd need to hold onto the original trained dictionary for both compression and decompression, for any row you want to be able to decompress in the future. And you might want to retrain the dictionary every so often for slowly-changing types of data, which might prevent "drift" of the efficiency towards less-efficient over time) I believe Postgres already uses this under the hood for some columnar data. It wouldn't take much to index it before compressing it and just decompress it at will. Or maybe it just got added? https://devm.io/databases/postgresql-release |
|