|
|
|
|
|
by viscanti
850 days ago
|
|
But the base model, when its trained on the whole internet, will have some extreme biases on topics where there's a large and vocal group on one side and the other side is very silent. So RLHF is the attempt to correct for the biases on the internet. |
|
...or it can be used to reinforce a specific ideology. Completely dependent on who does the RLHF and what their motivations are.