|
|
|
|
|
by sharemywin
637 days ago
|
|
Wouldn't this apply to all prediction machines that make errors. Humans make bad predictions all the time but we still seem to manage to do some cool stuff here and there. part of an agents architecture will be for it to minimize e and then ground the prediction loop against a reality check. making LLMs bigger gets you a lower e with scale of data and compute but you will still need it to check against reality. test time compute also will play a roll as it can run through multiple scenarios and "search" for an answer. |
|
>> part of an agents architecture will be for it to minimize e and then ground the prediction loop against a reality check.
The problem is that web-scale LLMs can only realistically be trained to maximise the probability of the next token in a sequence, but not the factuality, correctness, truthfullness, etc of the entire sequence. That's because web-scale data is not annotated with such properties. So they can't do a "reality check" because they don't know what "reality" is, only what text looks like.
The paper above uses an "oracle" instead, meaning they have a labelled dataset of correct answers. They can only train their RL approach because they have this source of truth. This kind of approach just doesn't scale as well as predicting the next token. It's really a supervised learning approach hiding behind RL.