Hacker News new | ask | show | jobs
by researchers 509 days ago
Tuning for qualitative outcomes is pretty much solved via RLHF/DPO (what this post calls "preference tuning"). Right?