|
|
|
|
|
by jakozaur
1969 days ago
|
|
LZ4 is so fast, that in make sense to use it everywhere over uncompressed data. Even storing items in-memory compressed sometimes is profitable as you can fit more items in memory. Still zstd offers way better compression and got variable difficulty factor: https://github.com/facebook/zstd Decompression is always fast, but you can trade off compression vs. ratio factor. In general if send data over network zstd is quite profitable. Even network attached disk AWS EBS or AWS S3 it can be a hugely profitable. |
|
This is the case in ClickHouse: if data is compressed, we decompress it in blocks that fit in CPU cache and then perform data processing inside cache; if data is uncompressed, larger amount of data is read from memory.
Strictly speaking, LZ4 data decompression (typically 3 GB/sec) is slower than memcpy (typically 12 GB/sec). But when using e.g. 128 CPU cores, LZ4 decompression will scale up to memory bandwidth (typically 150 GB/sec) as well as memcpy. And memcpy is wasting more memory bandwidth by reading uncompressed data while LZ4 decompression reads compressed data.