|
|
|
|
|
by Ukv
563 days ago
|
|
> LLMs do have actual knowledge - the knowledge that was encoded in the words in the training data. That's not how they store the data internally, but the actual knowledge comes from there. For non-multimodal models, and minus ephemeral context and what's encoded by the architecture (like the translational invariance of CNNs), I'd agree to that. > And I wasn't saying that that's enough. I was saying that the LLM advocates think, or at least claim, that it's enough. Most modern LLMs like GPT-4, LLaMA-3.2, Gemini, or Claude 3.5 are already multimodal (text, images, sometimes video, sometimes audio). If you primarily just meant that's a good pathway to building richer internal world representations (and thus better at answering questions involving 3D geometry, for instance) then I'd also agree there, though I don't see why it'd be a requirement for reasoning/etc. (opposed to just beneficial). |
|
Just LLMs aren't enough, and they aren't going to be enough.
You use words like "reasoning", but LLMs do not reason in the same way that an inference engine does. They can, at best, simulate it badly. I think we need more - not more of what we've got, but more of a different kind.