How can the RLHF phase eliminate bias if it uses a process(human input) that has the same biases as the pre-training(human input)?
During RLHF, the human evaluators are aware of such biases and are instructed to down-vote the model responses that incorporate such biases.
During RLHF, the human evaluators are aware of such biases and are instructed to down-vote the model responses that incorporate such biases.