|
|
|
|
|
by alecco
919 days ago
|
|
Note the model is trained on data generated by GPT-4. It's probably orders of magnitude more expensive to generate the data at current API prices. The whole point of these papers is that training data quality is key. I would much prefer for these companies to release the training data than the weights. But that will never happen. "We speculate that the creation of synthetic datasets will become, in the near future, an important technical skill and a central topic of research in AI." |
|
i.e. master teaches apprentice or LLM trains SLM
https://arxiv.org/abs/2305.02301 (May '23)