Hacker News new | ask | show | jobs
by changoplatanero 856 days ago
I can second that. From what I’ve heard from people at leading labs, it’s not clear that dpo is worth switching to from RLHF