|
|
|
|
|
by hyperpape
2 days ago
|
|
I think this is an analogy that's been taken far too far. The output of intelligence just isn't compression, that's memorization. The role of intelligence is to generate novelty. It's true that LLMs do something that looks very compression like in their weights, but it is lossy, and it has to be--if you're not lossy, you've overfitted the corpus, and that's bad. Post-training takes this even further, because you're not doing anything that looks like training on a specific corpus, you're exploring in a wider space of text. That text doesn't even concretely exist until you start exploring it. I'm sure there must be a serious attempt to pursue this analogy that isn't just handwaving, but I haven't seen it. |
|
You can use the fact that LLMs predict P(next token | existing tokens) to losslessly and efficiently compress arbitrary token sequences. This idea is closely related to arithmetic coding.