Y
Hacker News
new
|
ask
|
show
|
jobs
by
abscind
1101 days ago
Any reason RLHF isn't just a band-aid on "not having enough data?"
2 comments
trade_monkey
1101 days ago
RLHF is a band aid on not having enough data that fits your own biases and answers you want the model to give.
link
astrange
1100 days ago
It won't give answers at all if you don't train it to. It will output more questions because that's a more obvious completion to an incoming question.
link
astrange
1100 days ago
Less data can be better if the data is good:
https://arxiv.org/abs/2306.11644
A language model won't develop question-answering behavior unless you train it to though.
link