Hacker News new | ask | show | jobs
by csjh 645 days ago
I think the most surprising part here is the gzipped-base64'd-compressed data almost entirely removes the base64 overhead.
1 comments

It feels somewhat intuitive since (as the article notes) the Huffman encoding stage effectively "reverses" the original base64 overhead issue that an 8 bit (256 choices) index is used for 6 bits (64 choices) of actual characters. A useful compression algorithm which _didn't_ do this sort of thing would be very surprising as it would mean it doesn't notice simple patterns in the data but somehow compresses things anyways.
how does it affect error correction though?
Neither base64 nor the gzipped version have error correction as implemented. The extra overhead bits in base64 come from selecting only a subset of printable characters, not by adding redundancy to the useful bits.