| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mistercow 4228 days ago

For lossless compression, start out by reading up on run length encoding, which is about as simple as it gets. Then there's DEFLATE, which is what's behind gzip. You'll also want to read up on Huffman coding and arithmetic coding. One of the newer ideas in lossless compression is "context mixing", which is definitely worth reading up on. The Burrows–Wheeler transform is pretty fascinating, albeit slow, so you might want to read about that as well.

Lossy compression (which RegPack is actually an example of), typically involves lossless compression plus a preprocessing step to discard information so that the lossless compressor can do a better job.

The techniques for discarding information vary based on application. For images and audio, the most successful techniques tend to involve working in frequency space, so you'll want to read up on the FFT and related transforms.

For code compression, you have different constraints when discarding information because you have to produce a program with identical output to the original. Simple techniques involve removing comments and renaming variables to have shorter names. More advanced techniques involve more complex transformations of the AST, including language-specific tricks like wrapping javascript in a "with(Math){}" block.

Hope some of that helps.