|
|
|
|
|
by DoctorOetker
1382 days ago
|
|
suppose one has an idea for a different architecture / functional form etc, assuming the receiving model is substantially smaller so that the dominant computational cost is in the SD model, how long would effective knowledge distillation take on say a CPU? |
|