Hacker News new | ask | show | jobs
by zX41ZdbW 1969 days ago
Using LZ4 can easily improve performance even if all data reside in memory.

This is the case in ClickHouse: if data is compressed, we decompress it in blocks that fit in CPU cache and then perform data processing inside cache; if data is uncompressed, larger amount of data is read from memory.

Strictly speaking, LZ4 data decompression (typically 3 GB/sec) is slower than memcpy (typically 12 GB/sec). But when using e.g. 128 CPU cores, LZ4 decompression will scale up to memory bandwidth (typically 150 GB/sec) as well as memcpy. And memcpy is wasting more memory bandwidth by reading uncompressed data while LZ4 decompression reads compressed data.

1 comments

That's extremely interesting. Thanks for sharing.

I'm not into C/C++ for years though and now I wouldn't grok the code, sadly.