Y
Hacker News
new
|
ask
|
show
|
jobs
by
langcss
697 days ago
I think this is how RLHF actually works.
https://huyenchip.com/2023/05/02/rlhf.html#3_1_reward_model