|
|
|
|
|
by rybosome
345 days ago
|
|
Agreed, especially when in this context of training a smaller model on a larger model’s outputs. Distillation is generally accepted as an effective technique. This is exactly what I did in a previous role, fine-tuning Llama and Mistral models on a mix of human and GPT-4 data for a domain-specific task. Adding (good) synthetic data definitely increased the output quality for our tasks. |
|