Hacker News new | ask | show | jobs
by cztomsik 860 days ago
DPO is not RLHF.