| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kawin 804 days ago
	This is great advice! I'd like to add that if you don't have pairwise preference data (A > B) but do have binary data (A is good for x_1, B is good for x_2, etc.), then Kahneman-Tversky Optimization (KTO) might be a better fit. Despite learning with a weaker signal, it works as well or better than dpo in practice.