Hacker News new | ask | show | jobs
by hidenotslide 3116 days ago
I saw a talk on this paper a couple years ago. https://arxiv.org/abs/1503.02531 The method is to train a smaller model on the predictions of a large model or ensemble. I'd be interested in knowing other techniques as well.