Hacker News new | ask | show | jobs
by panarky 784 days ago
Your understanding of how LLMs work is overly simplistic and incomplete.

Yes, doing probabilistic next-word prediction plays a role in how LLMs generate text output, but that's not the whole story.

LLMs "understand" (to a degree): They develop complex internal representations of concepts they've been trained on. This isn't just about word association; they develop an understanding of the relationships between objects, actions, and ideas.

They can reasoning, not just mimic: LLMs can perform logical reasoning, using their internal knowledge base to solve problems or answer questions. This might involve following multi-step instructions, drawing inferences from information provided, or adapting to new prompts in a way that requires a degree of abstract thinking.

Beyond simple probabilities: Yes, LLMs do consider the probability of certain word sequences, but their output is far more sophisticated than just picking the most likely next word. They weigh context, concepts, relationships, nuance, logic, and even the unstated but inferred purpose of the user when generating responses.

7 comments

I know that's the messaging, but the real link to reality is very tenuous. This was a great example from the last couple days:

https://twitter.com/colin_fraser/status/1785132544482226679

I just tried a similar question now with ChatGPT4:

"If a man and a goat are on one side of a river, what is the minimum amount of trips required to get the man and goat to the other side in a boat. Assume the boat can hold at most one animal and one human."

ChatGPT: 3 trips

That is very much closer to "trying to predict next word from examples" than "billion-dollar model with internal reasoning".

That sounds surprisingly close to how a toddler might reason, only difference is the toddler can eventually see the flaw in their reasoning if you press them long enough while the LLM doesn't have the architecture for learning in real time yet
I was repeatedly amazed at how smart my toddler was. You just feel the general intelligence.

She's a bit older now (5) but e.g. a few days ago I was talking about cleaning the whole house. She said "you didn't clean the WHOLE house, look there's something you didn't clean".

The LLM does figure it out if you ask further questions in the same chat. Here's GPT-3.5 https://chat.openai.com/share/a8669390-8eb0-46c2-b804-3aafc3...
If you spoil it with your followup questions... which doesn't help because the point of these is that they're controlled experiments where you do know what the right answer and logic is. You can't test when you don't.
It's not spoiling anything. It's just an observation of the limits of current LLMs.

I tried a few chain of thought prompts for the original question and GPT-3.5 was sometimes (randomly) able to find the correct answer on the first attempt for this one

https://chat.openai.com/share/c144ba23-2f78-4cc8-a1c5-ca3106...

Take out this

  Instructions: 
  1. Do not include any assumptions that I have not mentioned here. 
  2. Before solving the problem, state the goal of the problem. 
  3. After each step of your reasoning, state where the man and the goat are now standing, and state if the goal has been achieved or not, if the goal has been achieved then stop. If the goal has not been achieved then explain why not.
Then tell me what happens

Spoilage is incredibly easy to do. It is about information leakage and you have to think very carefully about how information can leak through in subtle ways. Specifically #1 and #2 are strong hints that there is a trick to the problem (i.e. is this something you would use in a generic prompt?). #3 is a reiteration of the problem, that gives extra weight. You can decrease the weight by restating as "state where the man and any animals are located" (notice there's lower information gain here). " if the goal has been achieved then stop." is a big hint. To reason, it should know when to stop.

I posted some recent river crossing tweets in this comment that may be of interest to you https://news.ycombinator.com/item?id=40231409

I tried "are you sure", which often triggers some reasoning, and it was pretty confident. I'm trying not to give it the answer, but run it as if I didn't have any special knowledge. GPT + human > GPT. I mean, we're treating these things like another kind of intelligence, not a hammer.

GPT4: https://chat.openai.com/share/1beb5391-d321-4515-945e-38233f...

Another really oddly effective one is offering it a $500 tip
> their output is far more sophisticated than just picking the most likely next word

Picking the most likely next word is an extremely sophisticated algorithm, if you could do that you would almost sound human. Like an LLM!

However, focusing on just picking the most likely next words do cause some issues, such as you being biased towards words that were posted instead of those that were too boring to post, so there are still things that such an algorithm can't understand like the probabilities since those are so unevenly posted.

That's a lot of anthropomorphizing, but I don't see anything to back up your claims.
I think it's still more in the realm of philosophy. But I do have an argument that NNs demonstrate abstract, generalized learning: the transfer learning effect.

Neural networks pre-trained on data for a completely different task, learn new tasks much faster. With a GPT-like transformer, you can feed it PCM audio samples encoded as uuencoded text, or paintings encoded in the same way, and it learns how to translate English <-> Russian when later trained on that, much faster than from a completely randomized model. There's something common to those seemingly disparate tasks that is learned. "Abstraction" may be the right word for this.

Abstraction seems too generous of an interpretation.

A more parsimonious hypothesis is that random networks start out broken, structurally incapable of computation because the structure has parts where information stops flowing or signal gain is so low at certain choke points that it’s presence is like a random coin flip.

Training the network to compute ANYTHING fixes this flow problem, making subsequent training easier, without introducing any kind of abstraction.

Neural Networks have been studied for a long time. We learned this in 1990.

https://news.ycombinator.com/item?id=40230764

> They can reasoning, not just mimic

I'm still waiting for this. Here's some recent tweets about people with variations of the river crossing puzzle. All LLMs seem to fail pretty badly.

LLMs are impressive enough as they are (compression systems with human language interface), you don't need to hype them up to something they're not.

https://twitter.com/jeremyphoward/status/1783712611126964627

https://twitter.com/WaltonStevenj/status/1785145923771011215

https://twitter.com/colin_fraser/status/1785132544482226679

I feel like this is the midwit meme response and the idiot and genius are just saying “it’s next token probabilistic geberation”
A typical mid-wit response is to say it's just a Markov chain doing naive next token prediction without any semantic model. That's not how deep learning works.

The LLM neural network contains a semantic model and it performs some type of reasoning over that model. The idiot and the genius both can see that ChatGPT has some reasoning capability.

https://www.cs.toronto.edu/~hinton/absps/AIJmapping.pdf

"This 1990 paper demonstrated how neural networks could learn to represent and reason about part-whole hierarchical relationships, using family trees as the example domain.

By training on examples of family relations like parent-child and grandparent-grandchild, the neural network was able to capture the underlying logical patterns and reason about new family tree instances not seen during training.

This seminal work highlighted that neural networks can go beyond just memorizing training examples, and instead learn abstract representations that enable reasoning and generalization"

I would tend to agree with seizethecheese. I don't think anyone says "Markov chain doing naive next token prediction without any semantic model" emphasis on the last part of that sentence. The idiot only says the first part and disregards the last part. The genius says the first part with an understanding that, no matter how complex a semantic model is, the "reasoning" that occurs is completely a function of the training data. Change the data, change the weights. At no point in the model does anything guarantee consistency of logic or anything like that. Certainly, LLMs are not interested in any notion of truth in the philosophical sense.

Here is a succinct example using Claude:

Prompt:

Using the decimal system, redefine all symbols by shifting them one to right such that 0 becomes 1, 1 becomes 2, 2 becomes 3, etc. Now, let's do math. What is 2x3?

===

Okay, with the symbols shifted one to the right in the decimal system:

0 becomes 1

1 becomes 2

2 becomes 3

3 becomes 4

4 becomes 5

5 becomes 6

6 becomes 7

7 becomes 8

8 becomes 9

9 becomes 0 (looping back around)

So if 2 becomes 3, and 3 becomes 4, then:

2 x 3 = 3 x 4 = 12

Shifting those digits one to the right based on the new mapping: 12 becomes 23

Therefore, under this new symbol mapping, 2 x 3 = 23

===

Regardless, if you are interpreting 2x3 to be in the original space, then 2x3 = 6 which would be 7 in the new space. If you are interpreting 2x3 to be in the new space (which is how I would interpret it), then 2x3 is 1x2 = 2 which is 3.

I think it's clear from this example that the LLM has 0 ability to reason.

> I think it's clear from this example that the LLM has 0 ability to reason.

It's not a 0 or 1. You are oversimplifying it. Obviously neural networks can learn to generalize patterns of reasoning inferred from their training data. We know that they are not using explicitly defined formal systems of reasoning, and they have some limitations compared to those systems. Anyone who seriously studied neural networks or machine learning understands this.

By the same logic, practically every human on Earth has "0 ability to reason" as their biological neural network will get confused and make mistakes.

Anyone who has studied neural networks also knows there's no comparison between computer neural networks and human biological neural networks. The name was picked because of a passing familiarity with the biological by someone who didn't have any experience in biological neural networks. It's been sufficiently proven they have no similarity by countless academics.
That is a blatant oversimplification and not true. There are both similarities and differences. New ANN training methods are inspired by studies of biological neural networks (Dropout Regularization is one example)
It actually is 0 or 1 in this case. You either have the power of deduction or you do not. You have either proved a theorem or you have not. If you got to a correct conclusion through incorrect means, you have incorrectly reasoned. There is no spectrum in reasoning. Perhaps a spectrum in abilities across humans but not in the logic itself.

> By the same logic, practically every human on Earth has "0 ability to reason" as their biological neural network will get confused and make mistakes.

And therein lies the problem with this whole debate. I think a huge part of the debate is conflating the fact that most humans do not reason well (I wouldn't say they cannot reason) and make mistakes means that reasoning is something fuzzy and make statements like "LLMs reason about as well as humans". Very few humans outside of mathematicians practice logic on a daily basis. Most humans get by with muscle memory and pattern recognition of previous tasks. Just because LLMs are roughly as good as humans at this behavior does not make them able to reason. I would be totally fine if people just replaced "can reason" with "are useful" within their statements so they would look more like "LLMs are as useful as humans in answering MCAT tests." To imply there is a rational actor deriving responses from first order logic is disingenuous in my opinion.

Only if you define reasoning ability as exactly equivalent in capability to a formal theorem prover. But that is a difference in tribe or philosophy. Your Symbolic/Classical rule-based AI tribe versus the Connectionist AI tribe. No point discussing further as it's like arguing Democrat vs Republican. Both approaches have their strengths and weaknesses.
Yeah I think the word probabilistic has been broadly misconstrued, because we usually see it in contexts where probabilities are used as shortcuts to avoid comprehensive analysis. But a probability can also be the output of a detailed analysis with rich semantic representation of causes.
I mean, that's exactly what LLMs are. In the absence of any real understanding of cognition, we are just throwing shit at the wall and seeing what the probabilistic model does with the massive amount of data we give it. A definitive analysis, were one even possible with our current models of computing, would probably outperform an LLM.
I hate this industry so much