|
|
|
|
|
by kerkeslager
455 days ago
|
|
I think it's more fundamental than that. If you start saying "it thinks" in regards to an LLM, you're wrong. LLMs don't think, they pattern match fuzzily. If the training data contained a bunch of answers to questions which were simply "I don't know", you could get an LLM to say "I don't know" but that's still not actually a concept of not knowing. That's just knowing that the answer to your question is "I don't know". It's essentially like if you had an HTTP server that responded to requests for nonexistent documents with a "200 OK" containing "Not found". It's fundamentally missing the "404 Not found" concept. LLMs just have a bunch of words--they don't understand what the words mean. There's no metacognition going on for it to think "I don't know" for it to even think you would want to know that. |
|
I'm not sure if this objection is terribly helpful. We use terms like think and want to describe processes that are clearly not involve any form of understanding. Electrons do not have motivations but they 'want' to go to a lower energy level in an atom. You can hold down the trigger for the fridge light to make it 'think' that the door has not been opened. These are uncontentious phrases that convey useful ideas.
I understand that when people are working towards producing reasoning machines the words might be working in similar spaces, but really when someone is making claims about machines having awareness, understanding, or thinking they make it quite clear about the context that they are talking about.
As to the rest of your comment, I simply disagree. If you think of a concept of an internal representation of a piece of information, then it has been shown that they do have such representations. In the Karpathy video I mentioned he talks about how researches found that models did have an internal representation of not knowing, but that the fine tuning was restricting it to providing answers. Giving it fine-tuning examples where it said "I don't know" for information that they knew the model didn't know. This generalised to provide "I don't know" for examples that were not in the training data. For the fine tuning examples to succeed in that, it requires the model to already contain the concept.
I would agree that models do not have any in-depth understanding of what lack of knowledge actually is. On the other hand I would also think that this also applies to humans, most people are not philosophers.
I think that the models can express details about words shows that they do have detailed information about what each word means semantically. In many respects because of tokenisation indexing embeddings it would perhaps be more accurate to say that they have a better understanding of the semantic information of what words mean the what the words actually are. This is why they are poor at spelling but can give you detailed information about the thing they can't spell.