Hacker News new | ask | show | jobs
by imtringued 751 days ago
No it is because supervised and self supervised learning happen to produce reasoning as a byproduct. For some reason people think that telling a model to recite a trillion tokens somehow will improve it beyond the recitation of those tokens. I mean, in theory you can select the training data so that it will learn what you want, but then again you are limited to what you taught it directly.

The problem is that these models weren't trained to reason. For the task of reasoning, they are overfitting to the dataset. If you want a machine to reason, then build and train it to reason, don't train it to do something else and then expect it to do the thing you didn't train it for.

1 comments

> The problem is that these models weren't trained to reason.

Except they kind of were. Specifically, they were trained to predict next tokens based on text input, with the optimization function being, does the result make sense to a human?. That's embedded in the training data: it's not random strings, it's output of human reasoning, both basic and sophisticated. That's also what RLHF selects for later on. The models are indeed forced to simulate reasoning.

> don't train it to do something else and then expect it to do the thing you didn't train it for.

That's the difference between AGI and specialized AI - AGI is supposed to do the things you didn't train it to do.

I think people don’t recognize it’s currently doing single turn reasoning and demonstrating the building blocks of real time reasoning with continuous input.

If we tested humans on first thought questions and answers in 5 seconds or less on half the problems we did on LLMs — we might prove humans can’t reason as well