For logical reasoning tasks you should use pen and paper if necessary, not just say the first thing that comes to mind.
Comparing one-shot LLM responses with what a human can do in their head doesn’t make much sense. If you ask a person, they would try to work out the answer using a logical process but fail due to a shortage of working memory.
An LLM will fail at the task because it is trying to generate a response token by token, which doesn’t make any sense. The next digit in the number can only be determined by following a sequence of logical steps, not by sampling from a probability distribution of next tokens. If the model was really reasoning the probability for each incorrect digit would be zero.
And that's why OpenAI o1 will use chain of thoughts for this particular question rather than hallucinate approximate answer. And it does work just like before by generating token by token.
No, but you can say "I don't know", "I can't do this in my head", "Why is this important?", "Let me get my calculator" or any other thing that is categorically more useful than just making up a result.
Claude 3.5 just... does the multiplication correctly by independently deciding to go step-by-step (don't see a convenient way to share conversations, but the prompt was just "What is 1682671* 168363?").
it's a weird differentiation , part of how they do that is by reading back what they said - someone trained in doing so could essentially abuse this characteristic themselves to do the math in a simplified step by step way if they had perfect recall of what they said or wrote..
in other words, for the LLMs that do that kind of thing well, like gpt-o1, don't they essentially also use 'a pen and paper'?
And this is very good comparison, because o1 indeed does multiply these numbers correctly...
Ask LLMs without chain of thought built-in is the same as to ask people to multiply these numbers without pen and paper. And LLMs with chain of thought actually are capable of doing this math.
LLMs have pen and paper: it's their output buffer, capped to a few KBs, which is far longer than necessary to multiply the two numbers.
If you tell an LLM to explain how to multiply two numbers it will give a flawless textbook answer. However when you ask it to actually multiply the numbers it will fail. LLMs have all the knowledge in the world in their memory, but they can't connect that knowledge into a coherent picture.
They have codified human knowledge in human language, represented by arrays of numbers. They can't access that knowledge in any meaningful way, they can just shuffle numbers to give the illusion of cogency.
Do you think your inner monologue is any different? Because it sure as hell isn’t the same system as the one doing math, or recognising faces, or storing or retrieving memories, to name a few
The comparison makes sense though. We're trying to build an simulated brain. We want to create a brain that can think about math.
And chain of thought is kind of like giving that brain some scratch space to figure out the problem.
This simulated brain can't access multiplication instructions on the CPU directly. It has to do the computation via it's simulated neurons interacting.
This is why it's not so surprising that this is an issue.
LLMs are not simulating brains in any capacity. The words 'neural network' shouldn't be taken at face value. A single human neuron can take quite a few 'neurons' and layers to simulate as a 'neural network'.
Sure, but the basic idea of firing neurons is there, and the connection of these "neurons" to a neural network like an LLM does not allow the network to perform computations directly.
The level of detail of the simulation has little bearing on this. And in fact whether you call it a simulation or something else doesn't matter either. Understanding that the LLM does not compute by using the CPU or GPU directly is what's necessary to understand why computation is hard for LLMs.
Does it have an understanding of the strict rules that govern the problem and that it needs to produce a result that is in total accordance to them? (In accordance which is not 100%, but boolean)
i.e., can it apply a function over a sentence?
The answer is sometimes. Typically it'll forget rules you've given it by the time it might be useful because of the memory limit of LLMs. Either way, you basically need to know it's hallucinating to you so you can keep applying more rules.