|
|
|
|
|
by imtringued
751 days ago
|
|
No it is because supervised and self supervised learning happen to produce reasoning as a byproduct. For some reason people think that telling a model to recite a trillion tokens somehow will improve it beyond the recitation of those tokens. I mean, in theory you can select the training data so that it will learn what you want, but then again you are limited to what you taught it directly. The problem is that these models weren't trained to reason. For the task of reasoning, they are overfitting to the dataset. If you want a machine to reason, then build and train it to reason, don't train it to do something else and then expect it to do the thing you didn't train it for. |
|
Except they kind of were. Specifically, they were trained to predict next tokens based on text input, with the optimization function being, does the result make sense to a human?. That's embedded in the training data: it's not random strings, it's output of human reasoning, both basic and sophisticated. That's also what RLHF selects for later on. The models are indeed forced to simulate reasoning.
> don't train it to do something else and then expect it to do the thing you didn't train it for.
That's the difference between AGI and specialized AI - AGI is supposed to do the things you didn't train it to do.