Hacker News new | ask | show | jobs
by duchenne 308 days ago
I have done that at meta/FAIR and it is published in the Llama 3 paper. You usually start from a seed. It can be a randomly picked piece of website/code/image/table of contents/user generated data, and you prompt the model to generate data related to that seed. After, you also need to pass the generated data through a series of verifiers to ensure quality.