Correct. You must perceive them as plausibility engines. The unstated hypothesis is that plausibility of output may converge towards correctness of output with increasing scale and sophistication. This hypothesis remains very far from proven.
I don't think it's that hard to understand what the hell is going on with LLMs under the hood. Ultimately it's a weighted sample of the training data. It has a relationship with reality insofar as one exists within the training data. HFRL makes it easier to believe something crazy is happening because the output is being weighted towards something that's believable to us.
Depending on what you mean by "weighted sample", that's either trivially true (the network is of course a function of its training data and nothing else) or trivially false (the network generalizes over the training data and has not memorized it). It is not a good intuition pump for why an LLM can hold up one end of a conversation, or follow novel instructions - it is not reading from a script, nor regurgitating chopped up pieces of text like a Markov chain. It is doing something very clever in a way that is not obvious.
>It has a relationship with reality insofar as one exists within the training data
Sure, but most things that learn have actual reality as a training set. LLMs have human curated data, which isn’t and can’t be perfectly representative of reality.
That is accurate to what they do. I think others need to imagine this as well. Far too many nontechnical people seem to treat them as some kind of Oracle.
Your understanding of how LLMs work is overly simplistic and incomplete.
Yes, doing probabilistic next-word prediction plays a role in how LLMs generate text output, but that's not the whole story.
LLMs "understand" (to a degree): They develop complex internal representations of concepts they've been trained on. This isn't just about word association; they develop an understanding of the relationships between objects, actions, and ideas.
They can reasoning, not just mimic: LLMs can perform logical reasoning, using their internal knowledge base to solve problems or answer questions. This might involve following multi-step instructions, drawing inferences from information provided, or adapting to new prompts in a way that requires a degree of abstract thinking.
Beyond simple probabilities: Yes, LLMs do consider the probability of certain word sequences, but their output is far more sophisticated than just picking the most likely next word. They weigh context, concepts, relationships, nuance, logic, and even the unstated but inferred purpose of the user when generating responses.
I just tried a similar question now with ChatGPT4:
"If a man and a goat are on one side of a river, what is the minimum amount of trips required to get the man and goat to the other side in a boat. Assume the boat can hold at most one animal and one human."
ChatGPT: 3 trips
That is very much closer to "trying to predict next word from examples" than "billion-dollar model with internal reasoning".
That sounds surprisingly close to how a toddler might reason, only difference is the toddler can eventually see the flaw in their reasoning if you press them long enough while the LLM doesn't have the architecture for learning in real time yet
I was repeatedly amazed at how smart my toddler was. You just feel the general intelligence.
She's a bit older now (5) but e.g. a few days ago I was talking about cleaning the whole house. She said "you didn't clean the WHOLE house, look there's something you didn't clean".
If you spoil it with your followup questions... which doesn't help because the point of these is that they're controlled experiments where you do know what the right answer and logic is. You can't test when you don't.
It's not spoiling anything. It's just an observation of the limits of current LLMs.
I tried a few chain of thought prompts for the original question and GPT-3.5 was sometimes (randomly) able to find the correct answer on the first attempt for this one
I tried "are you sure", which often triggers some reasoning, and it was pretty confident. I'm trying not to give it the answer, but run it as if I didn't have any special knowledge. GPT + human > GPT. I mean, we're treating these things like another kind of intelligence, not a hammer.
> their output is far more sophisticated than just picking the most likely next word
Picking the most likely next word is an extremely sophisticated algorithm, if you could do that you would almost sound human. Like an LLM!
However, focusing on just picking the most likely next words do cause some issues, such as you being biased towards words that were posted instead of those that were too boring to post, so there are still things that such an algorithm can't understand like the probabilities since those are so unevenly posted.
I think it's still more in the realm of philosophy. But I do have an argument that NNs demonstrate abstract, generalized learning: the transfer learning effect.
Neural networks pre-trained on data for a completely different task, learn new tasks much faster. With a GPT-like transformer, you can feed it PCM audio samples encoded as uuencoded text, or paintings encoded in the same way, and it learns how to translate English <-> Russian when later trained on that, much faster than from a completely randomized model. There's something common to those seemingly disparate tasks that is learned. "Abstraction" may be the right word for this.
Abstraction seems too generous of an interpretation.
A more parsimonious hypothesis is that random networks start out broken, structurally incapable of computation because the structure has parts where information stops flowing or signal gain is so low at certain choke points that it’s presence is like a random coin flip.
Training the network to compute ANYTHING fixes this flow problem, making subsequent training easier, without introducing any kind of abstraction.
A typical mid-wit response is to say it's just a Markov chain doing naive next token prediction without any semantic model. That's not how deep learning works.
The LLM neural network contains a semantic model and it performs some type of reasoning over that model. The idiot and the genius both can see that ChatGPT has some reasoning capability.
"This 1990 paper demonstrated how neural networks could learn to represent and reason about part-whole hierarchical relationships, using family trees as the example domain.
By training on examples of family relations like parent-child and grandparent-grandchild, the neural network was able to capture the underlying logical patterns and reason about new family tree instances not seen during training.
This seminal work highlighted that neural networks can go beyond just memorizing training examples, and instead learn abstract representations that enable reasoning and generalization"
I would tend to agree with seizethecheese. I don't think anyone says "Markov chain doing naive next token prediction without any semantic model" emphasis on the last part of that sentence. The idiot only says the first part and disregards the last part. The genius says the first part with an understanding that, no matter how complex a semantic model is, the "reasoning" that occurs is completely a function of the training data. Change the data, change the weights. At no point in the model does anything guarantee consistency of logic or anything like that. Certainly, LLMs are not interested in any notion of truth in the philosophical sense.
Here is a succinct example using Claude:
Prompt:
Using the decimal system, redefine all symbols by shifting them one to right such that 0 becomes 1, 1 becomes 2, 2 becomes 3, etc. Now, let's do math. What is 2x3?
===
Okay, with the symbols shifted one to the right in the decimal system:
0 becomes 1
1 becomes 2
2 becomes 3
3 becomes 4
4 becomes 5
5 becomes 6
6 becomes 7
7 becomes 8
8 becomes 9
9 becomes 0 (looping back around)
So if 2 becomes 3, and 3 becomes 4, then:
2 x 3 = 3 x 4 = 12
Shifting those digits one to the right based on the new mapping: 12 becomes 23
Therefore, under this new symbol mapping, 2 x 3 = 23
===
Regardless, if you are interpreting 2x3 to be in the original space, then 2x3 = 6 which would be 7 in the new space. If you are interpreting 2x3 to be in the new space (which is how I would interpret it), then 2x3 is 1x2 = 2 which is 3.
I think it's clear from this example that the LLM has 0 ability to reason.
> I think it's clear from this example that the LLM has 0 ability to reason.
It's not a 0 or 1. You are oversimplifying it. Obviously neural networks can learn to generalize patterns of reasoning inferred from their training data. We know that they are not using explicitly defined formal systems of reasoning, and they have some limitations compared to those systems. Anyone who seriously studied neural networks or machine learning understands this.
By the same logic, practically every human on Earth has "0 ability to reason" as their biological neural network will get confused and make mistakes.
Anyone who has studied neural networks also knows there's no comparison between computer neural networks and human biological neural networks. The name was picked because of a passing familiarity with the biological by someone who didn't have any experience in biological neural networks. It's been sufficiently proven they have no similarity by countless academics.
It actually is 0 or 1 in this case. You either have the power of deduction or you do not. You have either proved a theorem or you have not. If you got to a correct conclusion through incorrect means, you have incorrectly reasoned. There is no spectrum in reasoning. Perhaps a spectrum in abilities across humans but not in the logic itself.
> By the same logic, practically every human on Earth has "0 ability to reason" as their biological neural network will get confused and make mistakes.
And therein lies the problem with this whole debate. I think a huge part of the debate is conflating the fact that most humans do not reason well (I wouldn't say they cannot reason) and make mistakes means that reasoning is something fuzzy and make statements like "LLMs reason about as well as humans". Very few humans outside of mathematicians practice logic on a daily basis. Most humans get by with muscle memory and pattern recognition of previous tasks. Just because LLMs are roughly as good as humans at this behavior does not make them able to reason. I would be totally fine if people just replaced "can reason" with "are useful" within their statements so they would look more like "LLMs are as useful as humans in answering MCAT tests." To imply there is a rational actor deriving responses from first order logic is disingenuous in my opinion.
Yeah I think the word probabilistic has been broadly misconstrued, because we usually see it in contexts where probabilities are used as shortcuts to avoid comprehensive analysis. But a probability can also be the output of a detailed analysis with rich semantic representation of causes.
I mean, that's exactly what LLMs are. In the absence of any real understanding of cognition, we are just throwing shit at the wall and seeing what the probabilistic model does with the massive amount of data we give it. A definitive analysis, were one even possible with our current models of computing, would probably outperform an LLM.