Hacker News new | ask | show | jobs
by janalsncm 612 days ago
For logical reasoning tasks you should use pen and paper if necessary, not just say the first thing that comes to mind.

Comparing one-shot LLM responses with what a human can do in their head doesn’t make much sense. If you ask a person, they would try to work out the answer using a logical process but fail due to a shortage of working memory.

An LLM will fail at the task because it is trying to generate a response token by token, which doesn’t make any sense. The next digit in the number can only be determined by following a sequence of logical steps, not by sampling from a probability distribution of next tokens. If the model was really reasoning the probability for each incorrect digit would be zero.

1 comments

And that's why OpenAI o1 will use chain of thoughts for this particular question rather than hallucinate approximate answer. And it does work just like before by generating token by token.
Here are some actual performance metrics:

https://x.com/yuntiandeng/status/1836114401213989366

If chain of thought really worked we should see no difference between 1 digit and 20 digit multiplication.