Y
Hacker News
new
|
ask
|
show
|
jobs
by
brucethemoose2
929 days ago
In addition to what was said, if its anything like DPO you don't need a
lot
of data, just a good set. For instance, DPO requires "good" and "bad" responses for each given prompt.