Hacker News new | ask | show | jobs
by brucethemoose2 929 days ago
In addition to what was said, if its anything like DPO you don't need a lot of data, just a good set. For instance, DPO requires "good" and "bad" responses for each given prompt.