|
|
|
|
|
by jprete
957 days ago
|
|
> I've tested both cases: correcting it when it was really wrong, and correcting it confidently when it was actually right. Both times it agreed that it was wrong and regenerated the answer it gave me. This is the peril of using what really is fundamentally an autocomplete engine, albeit an extremely powerful one, as a knowledge engine. In fact, RLHF favors this outcome strongly; if the human says "this is right", the human doing the rating is very unlikely to uprate responses where the neural net insists they're still wrong. The network weights are absolutely going to get pushed in the direction of responses that agree with the human. |
|