| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by visarga 947 days ago

> it just feels a bit wrong to train a model on model outputs

If you have a small student model and a large teacher it makes sense, the student is better off after this distillation.

If you have a way to filter out low quality synthetic examples then it would be useful to generate a bunch more and take the best.

If your LLM is an agent, then it can generate feedback signals from the environment. Even a human-AI chat is a form of environment for the model. Every human response can be evaluated as positive or negative reward.

More fundamentally, organic datasets are very unbalanced, LLMs need more complex reasoning chains than what is usually available. There are some exceptions - in scientific papers, manuals and code you get very complex reasoning chains. But not in general. This issue can be fixed with synthetic data.

And even in principle, if you have a model at level N and want to make a dataset at level N+1, then you need to boost your model. You can give it more tokens, more attempts or more tools.