Hacker News new | ask | show | jobs
by lifthrasiir 1581 days ago
> It compresses better with new lines than if you remove them all (given you could just split on every 5 characters later), which is an odd quirk of compression algorithms that my brain will never quite grasp.

New lines give a usable context (namely the word boundary) to compression algorithms. If I give you an arbitrary unsorted list of 5-letter-long words with no delimiters you need to think harder to figure out that it is indeed a list of 5-letter-long words. Same for the compression algorithm.

> My best algorithm attempt + Brotli achieved 12,773 bytes, which is a painfully close 542 bytes away. It is 13,181 bytes raw though, and can technically be used in-memory, which is definitely a perk.

Yeah, the best solution depends on what you want to do with that. Your estimation is not too far from my experience: Roadroller tends to be on par with or slightly smaller than Brotli. Of course, Roadroller exists because web browsers generally don't provide a way to use Brotli in JS ;-)