| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jaidhyani 1024 days ago
	Alternatively, the prior on "this is not possible" is very low because RLHF & Friends have targeted metrics that, inadvertently or not, discourage that outcome.

1 comments

robertlagrant 1024 days ago

I think that's the right answer - human trainers prefer an answer, even a made up one, to "I don't know".

link

Jensson 1024 days ago

Dataset as well. In a forum if you don't know the answer you simply don't post. Only people who think they know will post an answer. In a dialogue you see a lot more "I don't know" since there they are expected to respond, but there isn't a lot of dialogue data to be found on the internet compared to open forum data.

link

SAI_Peregrinus 1024 days ago

Amazon product Q&A has a lot of "I don't know" answers. Unlike just about everywhere else on the internet.

link