| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ACCount37 190 days ago

Words are the "simplistic" projection of an LLM's abstract thoughts.

An LLM has: words in its input plane, words in its output plane, and A LOT of cross-linked internals between the two.

Those internals aren't "words" at all - and it's where most of the "action" happens. It's how LLMs can do things like translate from language to language, or recall knowledge they only encountered in English in the training data while speaking German.

2 comments

Hendrikto 190 days ago

> It's how LLMs can do things like translate from language to language

The heavy lifting here is done by embeddings. This does not require a world model or “thought”.

link

traverseda 190 days ago

LLMs are compression and prediction. The most efficient way to (lossfully) compress most things is by actually understanding them. Not saying LLMs are doing a good job of that, but that is the fundamental mechanism here.

link

emp17344 190 days ago

Where’s the proof that efficient compression results in “understanding”? Is there a rigorous model or theorem, or did you just make this up?

link

fc417fc802 189 days ago

It's the other way around. Human learning would appear to amount to very efficient compression. A world model would appear to be a particular sort of highly compressed data set that has particular properties.

This is a case where it's going to be next to impossible to provide proof that no counterexamples exist. Conversely, if what I've written there is wrong then a single counterexample will likely suffice to blow the entire thing out of the water.

link

traverseda 190 days ago

No answer I give will be satisfying to you until I could come up with a rigorous mathematical definition of understanding, which is de-facto solving the hard AI problem. So there's not really point in talking about it is there?

If you're interested in why compression is like understanding in many ways, I'd suggest reading through the wikipedia article on Kolmogorov complexity.

https://en.wikipedia.org/wiki/Kolmogorov_complexity

link

daveguy 190 days ago

The "cross-linked internals" only go one direction and only one token at a time, slide window and repeat. The RL layer then picks which few sequences of words are best based on human feedback in a single step. Even "thinking" is just doing this in a loop with a "think" token. It is such a ridiculously simplistic model that it is vastly closer to an adder than a human brain.

link