| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by hyperpape 2 days ago

I think this is an analogy that's been taken far too far. The output of intelligence just isn't compression, that's memorization. The role of intelligence is to generate novelty.

It's true that LLMs do something that looks very compression like in their weights, but it is lossy, and it has to be--if you're not lossy, you've overfitted the corpus, and that's bad. Post-training takes this even further, because you're not doing anything that looks like training on a specific corpus, you're exploring in a wider space of text. That text doesn't even concretely exist until you start exploring it.

I'm sure there must be a serious attempt to pursue this analogy that isn't just handwaving, but I haven't seen it.

1 comments

miki123211 2 days ago

LLM compression doesn't necessarily have to be lossy.

You can use the fact that LLMs predict P(next token | existing tokens) to losslessly and efficiently compress arbitrary token sequences. This idea is closely related to arithmetic coding.

link

hyperpape 2 days ago

True, but it's not relevant because that isn't how we actually train LLMs for use as quasi-intelligent tools. We specifically do not want the model to be able to just memorize its input, which is what your process requires.

Many things about the process are similar, so there's some analogy, but it just isn't the same.

link

mtdewcmu 2 days ago

When decompressing, you need to reproduce the output of the LLM exactly as it was during compression, otherwise the decompressor would output gibberish. Can you count on the LLM being that consistent?

link