Hacker News new | ask | show | jobs
by abscind 1101 days ago
Any reason RLHF isn't just a band-aid on "not having enough data?"
2 comments

RLHF is a band aid on not having enough data that fits your own biases and answers you want the model to give.
It won't give answers at all if you don't train it to. It will output more questions because that's a more obvious completion to an incoming question.
Less data can be better if the data is good: https://arxiv.org/abs/2306.11644

A language model won't develop question-answering behavior unless you train it to though.