Hacker News new | ask | show | jobs
by langcss 697 days ago
I think this is how RLHF actually works. https://huyenchip.com/2023/05/02/rlhf.html#3_1_reward_model