|
|
|
|
|
by ozgrakkurt
201 days ago
|
|
Working on improving and clarifying this! It only does delta and bitpacking now. It should do fairly well for a bunch of zeroes because it does bitpacking. I’m working on adding rle/ffor and also clarifying the strategy and making it flexible to modify the format internally so it won’t break API |
|
Good compression algorithms effectively use the same storage for highly-redundant data (not limited to all zeros or even all the same single word, though all zeros can sometimes be a bit smaller), whether it's 1 kiloword or 1 gigaword (there might be a couple bytes difference since they need to specify a longer variable-size integer).
And this does not require giving up on random-access if you care about that - you can just separately include an "extent table" (works for large regular repeats - you will have to detect this anyway for other compression strategies, which normally give up on random-access), or (for small repeats only) use strides, or ...
For reference, BTRFS uses 128KiB chunks for its compression to support mmap and seeking. Of course, the caller should make sure to keep decompressed chunks in cache.