|
|
|
|
|
by wongarsu
136 days ago
|
|
If we focus on base models and ignore the tuning steps after that, then LLMs are "just" a token predictor. But we know that pure statistical models aren't very good at this. After all we tried for decades to get Markov chains to generate text, and it always became a mess after a couple of words. If you tried to come up with the best way to actually predict the next token, a world model seems like an incredibly strong component. If you know what the sentence so far means, and how it relates to the world, human perception of the world and human knowledge, that makes guessing the next word/token much more reliable than just looking at statistical distributions. The bet OpenAI has made is that if this is the optimal final form, then given enough data and training, gradient descent will eventually build it. And I don't think that's entirely unreasonable, even if we haven't quite reached that point yet. The issues are more in how language is an imperfect description of the world. LLMs seems to be able to navigate the mistakes, contradictions and propaganda with some success, but fail at things like spatial awareness. That's why OpenAI is pushing image models and 3d world models, despite making very little money from them: they are working towards LLMs with more complete world models unchained by language I'm not sure if they are on the right track, but from a theoretical point I don't see an inherent fault |
|
First, the subjectivity of language.
1) People only speak or write down information that needs to be added to a base "world model" that a listener or receiver already has. This context is extremely important to any form of communication and is entirely missing when you train a pure language model. The subjective experience required to parse the text is missing.
2) When people produce text, there is always a motive to do so which influences the contents of the text. This subjective information component of producing the text is interpreted no different from any "world model" information.
A world model should be as objective as possible. Using language, the most subjective form of information is a bad fit.
The other issue in this argument is that you're inverting the implication. You say an accurate world model will produce the best word model, but then suddenly this is used to imply that any good word model is a useful world model. This does not compute.