|
|
|
|
|
by tehsauce
459 days ago
|
|
There has been some good research published on this topic of how RLHF, ie aligning to human preferences easily introduces mode collapse and bias into models. For example, with a prompt like: "Choose a random number", the base pretrained model can give relatively random answers, but after fine tuning to produce responses humans like, they become very biased towards responding with numbers like "7" or "42". |
|
https://en.wikipedia.org/wiki/42_(number)