| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by SknCode 88 days ago
	How?

1 comments

sigmoid10 88 days ago

Same way you distill any model. Training data efficiency matters only while you train the source model/ensemble. Once you have that you are purely compute bound during distillation.

link