|
|
|
|
|
by sota_pop
374 days ago
|
|
My understanding of model distillation is quite different in that it trains another (typically smaller) model using the error between the new model’s output and that of the existing - effectively capturing the existing model’s embedded knowledge and encoding it (ideally more densely) into the new. |
|