|
|
|
|
|
by ziaowang
467 days ago
|
|
This understanding is incomplete in my opinion. LLMs are more than emulating observed behavior. In the pre-training phase tasks like masked language model indeed train the model to mimic what they read (which of course contains lots of bias); but in the RLHF phase, the model tries to generate the best response judged by human evaluations (who tries to eliminate as much bias as possible in the process). In other words, they are trained to meet human expectations in this later phase. But human expectations are also not bias-free (e.g. from the preferring-the-first-choice phenomenon) |
|
How can the RLHF phase eliminate bias if it uses a process(human input) that has the same biases as the pre-training(human input)?