Hacker News new | ask | show | jobs
by lifthrasiir 1577 days ago
Roadroller [1] is probably a borderline general purpose compression algorithm, and with some automatic parameter tuning it results in 12,170 bytes estimated, at the expense of a lot of memory. "Estimated" because the algorithm was originally meant to be recompresssed in a ZIP file, so it doesn't bother to generate the smallest JS file in terms of uncompressed size (yet). But that estimation does include the decoder size so it is a good estimate for the Kolmogorov complexity of the generated code though.

[1] https://lifthrasiir.github.io/roadroller/ (the exact parameters: golf.horse dataset; input mode text; action write to document; # contexts 12 with 12,15,49,50,70,79,96,97,131,154,292,353; pollute the global scope; max memory usage 150 MB; precision 16; learning rate 1333; model max count 11; model base divisor 14; dynamic model flags -1; # abbreviations 64)

1 comments

I tried this out and got 12,231 bytes Brotli encoded, and 12,493 bytes gzipped, with 16,311 bytes raw -- so the estimate is very close.

It compresses better with new lines than if you remove them all (given you could just split on every 5 characters later), which is an odd quirk of compression algorithms that my brain will never quite grasp.

My best algorithm attempt + Brotli achieved 12,773 bytes, which is a painfully close 542 bytes away. It is 13,181 bytes raw though, and can technically be used in-memory, which is definitely a perk.

> It compresses better with new lines than if you remove them all (given you could just split on every 5 characters later), which is an odd quirk of compression algorithms that my brain will never quite grasp.

New lines give a usable context (namely the word boundary) to compression algorithms. If I give you an arbitrary unsorted list of 5-letter-long words with no delimiters you need to think harder to figure out that it is indeed a list of 5-letter-long words. Same for the compression algorithm.

> My best algorithm attempt + Brotli achieved 12,773 bytes, which is a painfully close 542 bytes away. It is 13,181 bytes raw though, and can technically be used in-memory, which is definitely a perk.

Yeah, the best solution depends on what you want to do with that. Your estimation is not too far from my experience: Roadroller tends to be on par with or slightly smaller than Brotli. Of course, Roadroller exists because web browsers generally don't provide a way to use Brotli in JS ;-)