| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by abscind 1101 days ago
	Any reason RLHF isn't just a band-aid on "not having enough data?"

2 comments

trade_monkey 1101 days ago

RLHF is a band aid on not having enough data that fits your own biases and answers you want the model to give.

link

astrange 1100 days ago

It won't give answers at all if you don't train it to. It will output more questions because that's a more obvious completion to an incoming question.

link

astrange 1100 days ago

Less data can be better if the data is good: https://arxiv.org/abs/2306.11644

A language model won't develop question-answering behavior unless you train it to though.

link