Hacker News new | ask | show | jobs
by gbickford 869 days ago
Have you tried generating two sets of qapairs, one with bad answers, and using DPO?
1 comments

Not yet, sounds promising!