Hacker News new | ask | show | jobs
by ilaksh 860 days ago
How do you normally do DPO? Is that built in to PyTorch or something?

Theoretically the hard part is collecting the examples with rejections etc.

1 comments

Collecting data is hard, but the library is also a synthetic data generation library, so for example you can create the data for DPO fully synthetically, check out the self-rewarding LLMs example: https://datadreamer.dev/docs/latest/pages/get_started/quick_...