|
|
|
|
|
by rao-v
20 days ago
|
|
So first - these are terrific papers and I'd not seen some of them before. Having said that, I don't think these are classic student teacher distillation from random (which was my point). In fact, the "Embarrassingly Simple Self-Distillation" paper is using exactly what I was talking about "fine-tune on those samples with standard supervised fine-tuning". |
|