Hacker News new | ask | show | jobs
by nsthorat 3114 days ago
Often times researchers train huge models, but don't think about model size (because they don't have to). We've seen ~200MB production models get down to ~4MB and not lose much precision. I'm quite confident we'll continue that trend.

Don't forget that folks were saying this about the web when images / rich media were becoming prevalent!

2 comments

200MB is still a small model and 4MB is almost the double of an average web page (including images). 10MB web pages is really bad, more for countries that are still developing their infrastructure.
>> We've seen ~200MB production models get down to ~4MB and not lose much precision.

Details please. What techniques are used to reduce the model size?

I saw a talk on this paper a couple years ago. https://arxiv.org/abs/1503.02531 The method is to train a smaller model on the predictions of a large model or ensemble. I'd be interested in knowing other techniques as well.