Hacker News new | ask | show | jobs
by yellow_lead 170 days ago
> This may have a nontrivial memory cost, especially at high compression levels. (Don't set the compression window any larger than it needs to be!)

It sounds like these contexts should be cleared when they reach a certain memory limit, or maybe reset periodically, i.e every N messages. Is there another way to manage the memory cost?

2 comments

LZ77 compression (a key part of gzip and zip compression) uses a 'sliding window' where the compressor can tell the decompressor 'repeat the n bytes that appeared in the output stream m bytes ago'. The most widely used implementation uses a 15 bit integer for m - so the decompressor never needs to look more than 32,768 bytes back in its output stream.

Many compression standards include memory limits, to guarantee compatibility, and the older the standard the lower that limit is likely to be. If the standards didn't dictate this stuff, DVD sellers could release a DVD that needed a 4MB decompression window, and it'd fail to play on players that only had 2MB of memory - setting a standard and following it avoids this happening.

That's a misunderstanding. Compression algorithms are typically designed with a tunable state size paramter. The issue is if you have a large transfer that might have one side crash and resume, you need to have some way to persist the state to be able to pick up where you left off.