|
|
|
|
|
by forgotusername
4982 days ago
|
|
> but if you can put in any data you want, you could poison the compression enough to put it into a bad state -- one where effectively nothing compresses properly, and you end up with your own data completely in the clear. This sounds interesting, and completely counter to my understanding of how compression works.. I thought CRIME's innovation was to exploit compression ratios as a proxy for cleartext. Poking a compressor into emitting probabilistically assigned bit-aligned symbols that happen to correspond to its input seems unlikely. |
|
With careful control over the data, you could make it so no dictionary lookup would ever succeed. This would mean no strings would ever be eliminated through backreferences.
Most also have a minimum match length, making your life a bit easier here. Most also are outputting encoded streams that basically a little decompression VM (IE literal, 0, backreference, 255 bytes ago, size 30). Because of this, they will not eliminate duplicates where the match is too small (only 2 bytes).
This will get you past the duplicate elimination phase, but not the huffman phase.
Getting past the huffman phase is harder. To get it to output a stored block, you have to convince it the raw literal length of the block will end up less than the length of the block as encoded.
For zlib, we have opt_len = (sizeof (compressed data in bits) + sizeof (huffman tree in bits ) + 3 + 7) / 8
if (stored_len+4 <= opt_lenb) use stored block
So you do have some chance to beat it by messing with the probability distributions, and do get a little leeway.
On the plus side, you only need to mess with the probability for a single zlib block, not the whole shebang.