Hacker News new | ask | show | jobs
by RicDan 1101 days ago
Problem with this is that it leads to the algorithm targeting outputs that sound good for humans. Thats why its bad and wont help us, it should also incorporate „sorry dont know that“, but for that it needs to actually be smart
2 comments

Honesty/truthfulness is indeed a difficult problem with any kind of fine-tuning. There is no way to incentivize the model to say what it believes to be true rather than what human raters would regard as true. Future models could become actively deceptive.
It can be weighted to be more honest when it doesn't know if those answers are picked by the labeler.
Need smarter labelers