Y
Hacker News
new
|
ask
|
show
|
jobs
by
hackernewds
705 days ago
DPO most essentially has human feedback, depends on what the preference optimizations are