| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by RicDan 1101 days ago
	Problem with this is that it leads to the algorithm targeting outputs that sound good for humans. Thats why its bad and wont help us, it should also incorporate „sorry dont know that“, but for that it needs to actually be smart

2 comments

cubefox 1101 days ago

Honesty/truthfulness is indeed a difficult problem with any kind of fine-tuning. There is no way to incentivize the model to say what it believes to be true rather than what human raters would regard as true. Future models could become actively deceptive.

link

m00x 1101 days ago

It can be weighted to be more honest when it doesn't know if those answers are picked by the labeler.

link

dr_dshiv 1101 days ago

Need smarter labelers

link