|
|
|
|
|
by quicktwo
1577 days ago
|
|
I tried this out and got 12,231 bytes Brotli encoded, and 12,493 bytes gzipped, with 16,311 bytes raw -- so the estimate is very close. It compresses better with new lines than if you remove them all (given you could just split on every 5 characters later), which is an odd quirk of compression algorithms that my brain will never quite grasp. My best algorithm attempt + Brotli achieved 12,773 bytes, which is a painfully close 542 bytes away. It is 13,181 bytes raw though, and can technically be used in-memory, which is definitely a perk. |
|
New lines give a usable context (namely the word boundary) to compression algorithms. If I give you an arbitrary unsorted list of 5-letter-long words with no delimiters you need to think harder to figure out that it is indeed a list of 5-letter-long words. Same for the compression algorithm.
> My best algorithm attempt + Brotli achieved 12,773 bytes, which is a painfully close 542 bytes away. It is 13,181 bytes raw though, and can technically be used in-memory, which is definitely a perk.
Yeah, the best solution depends on what you want to do with that. Your estimation is not too far from my experience: Roadroller tends to be on par with or slightly smaller than Brotli. Of course, Roadroller exists because web browsers generally don't provide a way to use Brotli in JS ;-)