Y
Hacker News
new
|
ask
|
show
|
jobs
by
gbickford
869 days ago
Have you tried generating two sets of qapairs, one with bad answers, and using DPO?
1 comments
lewq
869 days ago
Not yet, sounds promising!
link