|
|
|
|
|
by TimPC
696 days ago
|
|
The red and blue agents are effectively unlimited sources of true and false examples so you can get far more efficient scale than you can by pre training with labelled inputs. It’s also far more targeted on correct/incorrect rather than a notion of answer quality which doesn’t directly get at hallucination vs reality. |
|