RLHF: Reinforcement Learning from Human Feedback

Y	Hacker News new \| ask \| show \| jobs

	RLHF: Reinforcement Learning from Human Feedback (huyenchip.com)
	4 points by madisonmay 1183 days ago

1 comments

heliophobicdude 1183 days ago

This is a very well written article. Not in the article, but can we still call models like Alpaca RLHF though? What do we call these models finetune on demonstrations created by other chat bots?

link