| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by researchers 509 days ago
	Tuning for qualitative outcomes is pretty much solved via RLHF/DPO (what this post calls "preference tuning"). Right?