Hacker News new | ask | show | jobs
by AIPedant 423 days ago
I nominally agree with this point - AGI is theoretically possible according to the Church-Turing thesis, we can “just” solve the Schrödinger for every atom in the human body.

The more salient point is that when a model reads “dog” it associates a bunch of text and images vaguely related to dogs. But when a human reads “dog” they associate their experiences with dogs, or other animals if they haven’t ever met a dog. In particular, cats who have met dogs also have some concept of “dog,” without using language at all. Humans share this intuitive form of understanding, and use it with text/speech/images to extend our understanding to things we haven’t encountered personally. But multimodal LLMs have no access to this form of intelligence, shared by all mammals, and in general they have no common sense. They can fake some common sense with huge amounts of text, but it is not reliable: the space of feline-level common sense deductions is not technically infinite, but it is incomprehensibly vast compared to the corpus of all human text and photographs.

1 comments

When a model reads "dog" it associates the patterns it gleaned from the text and images about dogs - its past 'experiences'. What is the difference in kind between that and animal understanding?

LLMs do have language-agnostic understandings in their latent space. "Dog" and "Perro" have largely the same representation (depending on the model. I believe more advanced ones show this more strongly?) as does a picture of a dog. I'm not sure if that's exactly the form of understanding you're referring to?

I agree the human text/images corpus is very small compared to evolution's millions of years of learnings from interacting with the environment, which is why I'm excited for RLing LLMs because it opens up the same data trove.