| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ziaowang 467 days ago
	This understanding is incomplete in my opinion. LLMs are more than emulating observed behavior. In the pre-training phase tasks like masked language model indeed train the model to mimic what they read (which of course contains lots of bias); but in the RLHF phase, the model tries to generate the best response judged by human evaluations (who tries to eliminate as much bias as possible in the process). In other words, they are trained to meet human expectations in this later phase. But human expectations are also not bias-free (e.g. from the preferring-the-first-choice phenomenon)

1 comments

I don't understand what you are saying.

How can the RLHF phase eliminate bias if it uses a process(human input) that has the same biases as the pre-training(human input)?

Texts in the wild used during pre-training contain lots of biases, such as racial and sexual biases, which are picked-up by the model.

During RLHF, the human evaluators are aware of such biases and are instructed to down-vote the model responses that incorporate such biases.