|
|
|
|
|
by groceryheist
899 days ago
|
|
I agree that this sketch comes closer to working in practice than simple RLHF. In my earlier comment I was imagining bringing in some auxiliary data like you describe to detect plagarism and then using RL to teach the model not to do it. |
|