Hacker News new | ask | show | jobs
by hackernewds 705 days ago
DPO most essentially has human feedback, depends on what the preference optimizations are