Hacker News new | ask | show | jobs
by quicktwo 1577 days ago
I tried this out and got 12,231 bytes Brotli encoded, and 12,493 bytes gzipped, with 16,311 bytes raw -- so the estimate is very close.

It compresses better with new lines than if you remove them all (given you could just split on every 5 characters later), which is an odd quirk of compression algorithms that my brain will never quite grasp.

My best algorithm attempt + Brotli achieved 12,773 bytes, which is a painfully close 542 bytes away. It is 13,181 bytes raw though, and can technically be used in-memory, which is definitely a perk.

1 comments

> It compresses better with new lines than if you remove them all (given you could just split on every 5 characters later), which is an odd quirk of compression algorithms that my brain will never quite grasp.

New lines give a usable context (namely the word boundary) to compression algorithms. If I give you an arbitrary unsorted list of 5-letter-long words with no delimiters you need to think harder to figure out that it is indeed a list of 5-letter-long words. Same for the compression algorithm.

> My best algorithm attempt + Brotli achieved 12,773 bytes, which is a painfully close 542 bytes away. It is 13,181 bytes raw though, and can technically be used in-memory, which is definitely a perk.

Yeah, the best solution depends on what you want to do with that. Your estimation is not too far from my experience: Roadroller tends to be on par with or slightly smaller than Brotli. Of course, Roadroller exists because web browsers generally don't provide a way to use Brotli in JS ;-)