|
|
|
|
|
by grohan
202 days ago
|
|
Bellard has trained various models, so it may not be the specific 169M parameter LLM, but his Transformer-based `nncp` is indeed #1 on the "Large Text Compression Benchmark" [1], which correctly accounts for both the total size of compressed enwik9 + decompresser size (zipped). There is no unfair advantage here. This was also achieved in the 2019-2021 period; it feels safe to say that Bellard could have likely pushed the frontier far further with modern compute/techniques. [1] https://www.mattmahoney.net/dc/text.html |
|