|
|
|
|
|
by felixhandte
275 days ago
|
|
This is because Zstd's long-distance matcher looks for matching sequences of 64 bytes [0]. Because long matching sequences of the data will likely have the newlines inserted in different offsets in the run, this totally breaks Zstd's ability to find the long-distance match. Ultimately, Zstd is a byte-oriented compressor that doesn't understand the semantics of the data it compresses. Improvements are certainly possible if you can recognize and separate that framing to recover a contiguous view of the underlying data. [0] https://github.com/facebook/zstd/blob/v1.5.7/lib/compress/zs... (I am one of the maintainers of Zstd.) |
|
I absolutely adore ZSTD, it has worked so well for me compressing json metadata for a knowledge engine.