Hacker News new | ask | show | jobs
by dlbucci 1366 days ago
A year ago, I was making a game for JS13K, where the goal is to make a game that fits in a zip file under 13,312 bytes. I had a two spots that were using the exact same copy-pasted chunk of code, so I figured I'd refactor it into a function to save space, but it actually _increased_ the size of the final zip when I did so. Turns out zip files compress repetitions of the same exact text _very_ well.
1 comments

Yes, the algorithm zip uses in this case uses backrefs "LZ77 algorithms achieve compression by replacing repeated occurrences of data with references to a single copy of that data existing earlier in the uncompressed data stream. A match is encoded by a pair of numbers called a length-distance pair, which is equivalent to the statement "each of the next length characters is equal to the characters exactly distance characters behind it in the uncompressed stream". (The distance is sometimes called the offset instead.)"

This is a different technique from entropy coding, which gets its improvements by allocating fewer bits for more frequent symbols. But most modern compresses uses a mix, for example gzip uses DEFLATE, which is a combination of literal backrefs and dynamic and static Huffman tables.

and Yes, JSON is usually absurdly compressible.