Hacker News new | ask | show | jobs
by valine 947 days ago
I'd think of it less like teaching the model something new, and more like enforcing a behavior the model can already output. Any decent raw model can output function names and parameters with prompt engineering. To do function calling, you need the model to output function names reliably for a wide variety of prompts. That's where the fine-tuning comes in.
1 comments

I could very easily believe that if I saw proof, but it just feels a bit wrong to train a model on model outputs.

Even in the main article here, the model did better with fewer fine tuned examples. To us, the auto-generated examples might look different enough and might look good enough, but they were all generated algorithmically. Feeding more examples in might easily be leading it to focus on some artifact of the embeddings or generating model that we just don't perceive.

> it just feels a bit wrong to train a model on model outputs

If you have a small student model and a large teacher it makes sense, the student is better off after this distillation.

If you have a way to filter out low quality synthetic examples then it would be useful to generate a bunch more and take the best.

If your LLM is an agent, then it can generate feedback signals from the environment. Even a human-AI chat is a form of environment for the model. Every human response can be evaluated as positive or negative reward.

More fundamentally, organic datasets are very unbalanced, LLMs need more complex reasoning chains than what is usually available. There are some exceptions - in scientific papers, manuals and code you get very complex reasoning chains. But not in general. This issue can be fixed with synthetic data.

And even in principle, if you have a model at level N and want to make a dataset at level N+1, then you need to boost your model. You can give it more tokens, more attempts or more tools.

theres a whole literature on distilation and student teacher networks