| >I think it's more fundamental than that. If you start saying "it thinks" in regards to an LLM, you're wrong. LLMs don't think, they pattern match fuzzily. I'm not sure if this objection is terribly helpful. We use terms like think and want to describe processes that are clearly not involve any form of understanding. Electrons do not have motivations but they 'want' to go to a lower energy level in an atom. You can hold down the trigger for the fridge light to make it 'think' that the door has not been opened. These are uncontentious phrases that convey useful ideas. I understand that when people are working towards producing reasoning machines the words might be working in similar spaces, but really when someone is making claims about machines having awareness, understanding, or thinking they make it quite clear about the context that they are talking about. As to the rest of your comment, I simply disagree. If you think of a concept of an internal representation of a piece of information, then it has been shown that they do have such representations. In the Karpathy video I mentioned he talks about how researches found that models did have an internal representation of not knowing, but that the fine tuning was restricting it to providing answers. Giving it fine-tuning examples where it said "I don't know" for information that they knew the model didn't know. This generalised to provide "I don't know" for examples that were not in the training data. For the fine tuning examples to succeed in that, it requires the model to already contain the concept. I would agree that models do not have any in-depth understanding of what lack of knowledge actually is. On the other hand I would also think that this also applies to humans, most people are not philosophers. I think that the models can express details about words shows that they do have detailed information about what each word means semantically. In many respects because of tokenisation indexing embeddings it would perhaps be more accurate to say that they have a better understanding of the semantic information of what words mean the what the words actually are. This is why they are poor at spelling but can give you detailed information about the thing they can't spell. |
...and that's why so many people are confused about what's going on with LLMs: sloppy, ambiguous use of language.
> In the Karpathy video I mentioned he talks about how researches found that models did have an internal representation of not knowing, but that the fine tuning was restricting it to providing answers. Giving it fine-tuning examples where it said "I don't know" for information that they knew the model didn't know.
This is why I included the HTTP example: this is simply telling it to parrot the phrase "I don't know"--it doesn't understand that it doesn't know. From the LLM's perpective, it "knows" that the answer is "I don't know". It's returning a 200 OK that says "I don't know" rather than returning a 404.
Do you understand the distinction I'm making here?
> I would agree that models do not have any in-depth understanding of what lack of knowledge actually is. On the other hand I would also think that this also applies to humans, most people are not philosophers.
The average (non-programmer) human, when asked to write a "Hello, world" program, can definitely say they don't know how to program. And unlike the LLM, the human knows that this is different from answering the question. The LLM, in contrast thinks it is answering the question when it says "I don't know"--it thinks "I don't know" is the correct answer.
Put another way, a human can distinguish between responses to these two questions, whereas an LLM can't:
1. What is my grandmother's maiden name?
2. What is the English translation of the Spanish phrase, "No sé."?
In the first question, you don't know the answer unless you are quite creepy; in the second case you do (or can find out easily). But the LLM tuned to answer I don't know thinks it knows the answer in both cases, and thinks the answer is the same.