| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Ukv 563 days ago

> LLMs do have actual knowledge - the knowledge that was encoded in the words in the training data. That's not how they store the data internally, but the actual knowledge comes from there.

For non-multimodal models, and minus ephemeral context and what's encoded by the architecture (like the translational invariance of CNNs), I'd agree to that.

> And I wasn't saying that that's enough. I was saying that the LLM advocates think, or at least claim, that it's enough.

Most modern LLMs like GPT-4, LLaMA-3.2, Gemini, or Claude 3.5 are already multimodal (text, images, sometimes video, sometimes audio). If you primarily just meant that's a good pathway to building richer internal world representations (and thus better at answering questions involving 3D geometry, for instance) then I'd also agree there, though I don't see why it'd be a requirement for reasoning/etc. (opposed to just beneficial).

1 comments

AnimalMuppet 563 days ago

No, I would put text, images, video, and audio as one kind of "stuff" - NN training stuff. I would put knowledge graphs and rules for reasoning engines as another kind of stuff. If you use "modes" for text and images and so on, then I want something different from just "multimodal". I want left-brain vs right-brain, or slow vs fast, or something on that order. I want a different kind - not just fancier and larger LLMs. I want an LLM coupled to an inference engine with the Cyc encyclopedia available to it... or something in that direction. Maybe further than that.

Just LLMs aren't enough, and they aren't going to be enough.

You use words like "reasoning", but LLMs do not reason in the same way that an inference engine does. They can, at best, simulate it badly. I think we need more - not more of what we've got, but more of a different kind.

link

Ukv 563 days ago

> I want something different from just "multimodal". I want left-brain vs right-brain, or slow vs fast, or something on that order. I want a different kind - not just fancier and larger LLMs. I want an LLM coupled to an inference engine with the Cyc encyclopedia available to it...

So if I'm understanding, your objection isn't about the modalities that the model can work with (text, video, diagrams, ...), but about the kinds of processing it can do?

Many modern LLMs support tool calling (e.g: to look up entities in Google's knowledge graph, or evaluate code), mixture-of-experts architecture (specialized subnetworks that are enabled/disabled as needed per-query), and chain-of-thought inference (for questions requiring more complex reasoning). Would you consider those to be steps in the right direction?

> You use words like "reasoning", but LLMs do not reason in the same way that an inference engine does

If you view reasoning as something inference engines can do, then I don't think we disagree too much. Remaining difference may just be about error rate - I'm personally fine saying something can reason (at least "to some extent") even if it's a little fuzzy and not 100.0% accurate formal logic (else animals would also be excluded).

link

AnimalMuppet 562 days ago

I view reasoning as something that LLMs do a kind of, or a subset of, and inference engines do a different kind or subset of. And there may be different kinds or subsets than just those two.

And just as inference engines, by themselves, were not enough to be really able to "reason", neither are LLMs, by themselves. (I think "AI" has historically been quite reductionist - they reduce thinking to only one kind of thinking, and then try to automate that. The result can sometimes be impressive, but always is less than what human thinking is.)

Tool calling or mixture-of-experts are in the direction that I'm thinking.

link