Hacker News new | ask | show | jobs
by SknCode 88 days ago
How?
1 comments

Same way you distill any model. Training data efficiency matters only while you train the source model/ensemble. Once you have that you are purely compute bound during distillation.