Hacker News new | ask | show | jobs
by kawin 804 days ago
This is great advice!

I'd like to add that if you don't have pairwise preference data (A > B) but do have binary data (A is good for x_1, B is good for x_2, etc.), then Kahneman-Tversky Optimization (KTO) might be a better fit. Despite learning with a weaker signal, it works as well or better than dpo in practice.