Hacker News new | ask | show | jobs
by cubefox 1062 days ago
Specifically about RLHF, I find this video by Rob Miles still the best presentation of the ingenious original 2017(!) paper: https://youtube.com/watch?v=PYylPRX6z4Q

RLHF is actually older than GPT-1, which came out in 2018. It didn't get applied to language models until 2022 with InstructGPT, an approach which combined supervised instruction fine-tuning with RLHF.