Hacker News new | ask | show | jobs
by mayukhdeb 495 days ago
In this paper, we don't zero out the weights. We remove them.
1 comments

Thanks for the correction! Can it be retrofitted into existing models through distillation or do you have to train the model from scratch?