| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cubefox 1062 days ago
	Specifically about RLHF, I find this video by Rob Miles still the best presentation of the ingenious original 2017(!) paper: https://youtube.com/watch?v=PYylPRX6z4Q RLHF is actually older than GPT-1, which came out in 2018. It didn't get applied to language models until 2022 with InstructGPT, an approach which combined supervised instruction fine-tuning with RLHF.