|
|
|
|
|
by npollock
501 days ago
|
|
A quote I found helpful: "reinforcement learning from human feedback .. is designed to optimize machine learning models in domains where specifically designing a reward function is hard" https://rlhfbook.com/c/05-preferences.html |
|