Hacker News new | ask | show | jobs
by andjd 820 days ago
I would be curious how this compares to a more-or-less off-the-shelf text compression algorithm like gzip. My guess is that over the entire database, this would be more efficient than the OP's ad-hoc implementation or any alternative mentioned here.
2 comments

Unlikely. Gzip and the like will do well with getting rid of the inherent redundancy of ascii coding but it's a general algorithm and can't take advantage of known structure.
That’s my first thought as well. Plain old gzip should do pretty well and provides a baseline to beat.