| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bigyabai 197 days ago
	RLHF is basically a fancy, overengineered GAN. Most of the industry could see that DPO was more efficient for fitting to human behavior.