Hacker News new | ask | show | jobs
by txrx0000 260 days ago
A description of the taste of chocolate must contain some information about the actual experience of eating chocolate. Otherwise, it wouldn't be possible for both the reader and the author to understand what the description refers to in reality. The description wasn't conceived in a vacuum, it's a lossy encode of all of the physical processes that preceded it (the further away, the lossier). One of the common processes encoded in the dataset of human-written text is whatever's in the brain that produces consciousness for all humans. The model might not even try to recover this if it's not useful for predicting the next token. The SNR of the encode may not be high enough to recover this given the limited text we have. But what if it was useful, and the SNR was high enough? I can't outright dismiss this possibility, especially as these models are getting better and better at behaving like humans in increasingly non-trivial ways, so they're clearly recovering more and more of something.
1 comments

Imagine you've never tasted chocolate and someone gives you a very good description of what it is to eat chocolate. You'd be nowhere near the actual experience. Now imagine that you didn't know first hand what it was like to 'eat' or to have a skeleton or a jaw. You'd lose almost all the information. The only reason spoken language works is because both people have that shared experience already
True. The description encodes very little about the actual sensory experience besides its relationship to similar experiences (bitterness, crunchiness, etc) and how to retrieve the memories of those experiences. It probably contains a lot more information about the brain's memory retrieval and pattern relating circuits than the sensory processing circuits.

Text is probably not good enough for recovering the circuits responsible for awareness of the external environment, so I'll concede that you and ijk's claims are correct in a limited sense: LLMs don't know what chocolate tastes like. Multimodal LLMs probably don't know either because we don't have a dataset for taste, but they might know what chocolate looks and sounds like when you bite into it.

My original point still stands: it may be recovering the mental state of a person describing the taste of chocolate. If we cut off a human brain from all sensory organs, does that brain which receives no sensory input have an internal stream of consciousness? Perhaps the LLM has recovered the circuits responsible for this thought stream while missing the rest of the brain and the nervous system. That would explain why first-person chain-of-thought works better than direct prediction.