|
|
|
|
|
by Dylan16807
205 days ago
|
|
> Presciently, Hutter appears to be absolutely right. His enwik8 and enwik9’s benchmark datasets are, today, best compressed by a 169M parameter LLM Okay, that's not fair. There's a big advantage to having an external compressor and reference file whose bytes aren't counted, whether or not your compressor models knowledge. More importantly, even with that advantage it only wins on the much smaller enwiki8. It loses pretty badly on enwiki9. |
|
There is no unfair advantage here. This was also achieved in the 2019-2021 period; it feels safe to say that Bellard could have likely pushed the frontier far further with modern compute/techniques.
[1] https://www.mattmahoney.net/dc/text.html