Hacker News new | ask | show | jobs
DPO: Direct Preference Optimization (github.com)
3 points by Garcia98 1086 days ago