|
|
|
|
|
by spmurrayzzz
793 days ago
|
|
I don't think we can call it distillation, at least not in the conventional ML use of the word as you're not interacting with any of the actual model architecture, specifically — not computing the loss between the predictions of the parent model and the distilled target model. This is an important distinction when it comes to assess model collapse risk, which is a risk I think has probably been overstated enough to this point where now its being understated. |
|