Hacker News new | ask | show | jobs
by Dylan16807 1249 days ago
> why is this true of any reasonable compression scheme?

Any? I wouldn't say that. If you took LZ4 and made it even simpler by removing uncompressed chunks, you would only have half a percent of overhead on random data. A thousandth of a percent if you tweaked how it represents large numbers.

1 comments

TIL. IIUC, LZ4 doesn't care about the compression ratio (to which you are correct I had been alluding) but does strongly care about guaranteeing a block maximum size. (so still the same kind of concern, just on an absolute and not a relative basis)
Just simplify it further. Get rid of the implicit +4 to the match size. 0-15 instead of 4-19. Now you can guarantee any block size you want.

If you wanted to go even simpler, here's an entire compression format described in one line:

one byte literal length, one byte match length, two bytes match offset, 0-255 literal bytes, repeat