Hard to outperform the model you distill...
Distillation helps with world knowledge and things like that.
They do use it for synthetic data/judging though, so yes, hard to outperform.
Not that they need to. If they can basically match it for a fifth of the price.
Distillation helps with world knowledge and things like that.