|
|
|
|
|
by willvarfar
1066 days ago
|
|
I would envisage the LLM is allowed to train on each and every input token. So, to begin with, it knows nothing; but to predict the very last token, it has internalised the whole preceding stream. Now I wouldn't expect it to be particularly competitive in enwik8 or enwik9, but the question would be: is there any max-model-size and input-length for which it would right now pull ahead and become the best known or at least competitive predictor? |
|
It is an interesting hypothesis. But my gut feeling is I would expect a LLM to perform on average worse. Competitive? Yes, but still worse. But it is something I am sure someone will test.
From the hundreds of different compressor models I have made for myself over the years. Usually believe it or not the compressed data is usually the best part. It is the decode tree/table/key/whatever that usually ends up crowding out the savings on the compressed data. In this case it would be the LLM weights or whatever the LLM spits out for the tree/decode.