| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bubble12345 612 days ago
	I mean so far LLMs can't even do addition and multiplication of integers accurately. So we can't really expect too much in terms of logical reasoning.

1 comments

boroboro4 612 days ago

Can you multiply 1682671 and 168363 without pen and paper? I can’t. LLMs can if you force them do it step by step, but can’t in one shot.

link

janalsncm 612 days ago

For logical reasoning tasks you should use pen and paper if necessary, not just say the first thing that comes to mind.

Comparing one-shot LLM responses with what a human can do in their head doesn’t make much sense. If you ask a person, they would try to work out the answer using a logical process but fail due to a shortage of working memory.

An LLM will fail at the task because it is trying to generate a response token by token, which doesn’t make any sense. The next digit in the number can only be determined by following a sequence of logical steps, not by sampling from a probability distribution of next tokens. If the model was really reasoning the probability for each incorrect digit would be zero.

link

boroboro4 612 days ago

And that's why OpenAI o1 will use chain of thoughts for this particular question rather than hallucinate approximate answer. And it does work just like before by generating token by token.

link

janalsncm 611 days ago

Here are some actual performance metrics:

https://x.com/yuntiandeng/status/1836114401213989366

If chain of thought really worked we should see no difference between 1 digit and 20 digit multiplication.

link

Tainnor 612 days ago

No, but you can say "I don't know", "I can't do this in my head", "Why is this important?", "Let me get my calculator" or any other thing that is categorically more useful than just making up a result.

link

solveit 612 days ago

It's relatively trivial to get an LLM that does that and every big lab has one, even if they're not selling them.

ChatGPT 4o as of right now just runs python code, which I guess is "Let me get my calculator", see https://chatgpt.com/share/670df313-9f88-8004-a137-22c302f8bf...).

Claude 3.5 just... does the multiplication correctly by independently deciding to go step-by-step (don't see a convenient way to share conversations, but the prompt was just "What is 1682671* 168363?").

link

serf 612 days ago

it's a weird differentiation , part of how they do that is by reading back what they said - someone trained in doing so could essentially abuse this characteristic themselves to do the math in a simplified step by step way if they had perfect recall of what they said or wrote..

in other words, for the LLMs that do that kind of thing well, like gpt-o1, don't they essentially also use 'a pen and paper'?

link

boroboro4 612 days ago

And this is very good comparison, because o1 indeed does multiply these numbers correctly...

Ask LLMs without chain of thought built-in is the same as to ask people to multiply these numbers without pen and paper. And LLMs with chain of thought actually are capable of doing this math.

link

akomtu 612 days ago

LLMs have pen and paper: it's their output buffer, capped to a few KBs, which is far longer than necessary to multiply the two numbers.

If you tell an LLM to explain how to multiply two numbers it will give a flawless textbook answer. However when you ask it to actually multiply the numbers it will fail. LLMs have all the knowledge in the world in their memory, but they can't connect that knowledge into a coherent picture.

link

namaria 612 days ago

They have codified human knowledge in human language, represented by arrays of numbers. They can't access that knowledge in any meaningful way, they can just shuffle numbers to give the illusion of cogency.

link

auggierose 612 days ago

Does that make an LLM the perfect academic?

link

_1tem 612 days ago

Pen and paper? LLMs are literally a computer program that cannot compute.

link

moi2388 612 days ago

But it can call into systems that can do compute.

Do you think your inner monologue is any different? Because it sure as hell isn’t the same system as the one doing math, or recognising faces, or storing or retrieving memories, to name a few

link

carlmr 612 days ago

The comparison makes sense though. We're trying to build an simulated brain. We want to create a brain that can think about math.

And chain of thought is kind of like giving that brain some scratch space to figure out the problem.

This simulated brain can't access multiplication instructions on the CPU directly. It has to do the computation via it's simulated neurons interacting.

This is why it's not so surprising that this is an issue.

link

namaria 612 days ago

LLMs are not simulating brains in any capacity. The words 'neural network' shouldn't be taken at face value. A single human neuron can take quite a few 'neurons' and layers to simulate as a 'neural network'.

link

carlmr 612 days ago

Sure, but the basic idea of firing neurons is there, and the connection of these "neurons" to a neural network like an LLM does not allow the network to perform computations directly.

The level of detail of the simulation has little bearing on this. And in fact whether you call it a simulation or something else doesn't matter either. Understanding that the LLM does not compute by using the CPU or GPU directly is what's necessary to understand why computation is hard for LLMs.

link

ulbu 612 days ago

Does it have an understanding of the strict rules that govern the problem and that it needs to produce a result that is in total accordance to them? (In accordance which is not 100%, but boolean) i.e., can it apply a function over a sentence?

I don’t know, that’s why I ask.

link

ThunderSizzle 612 days ago

The answer is sometimes. Typically it'll forget rules you've given it by the time it might be useful because of the memory limit of LLMs. Either way, you basically need to know it's hallucinating to you so you can keep applying more rules.

link

blitzar 612 days ago

282399355737 - My answer is not wrong, I was hallucinating.

link

tanduv 612 days ago

yea, but I'm able to count the number of r's in 'strawberry' without second guessing myself

link

mewpmewp2 612 days ago

Except o1 can do that and previously gpt could also do it if you asked it to count character by character while keeping count.

link