Hacker News new | ask | show | jobs
by nsthorat 3122 days ago
There is lots of work being done in model compression (quantization, simple factorization tricks, better conv kernels like depthwise separable convs, etc). We won’t let that happen!
1 comments

I am aware of that research, but even with a 20x decrease in size some models are still too big for web (think about world wide web, not internet in US).
Often times researchers train huge models, but don't think about model size (because they don't have to). We've seen ~200MB production models get down to ~4MB and not lose much precision. I'm quite confident we'll continue that trend.

Don't forget that folks were saying this about the web when images / rich media were becoming prevalent!

200MB is still a small model and 4MB is almost the double of an average web page (including images). 10MB web pages is really bad, more for countries that are still developing their infrastructure.
>> We've seen ~200MB production models get down to ~4MB and not lose much precision.

Details please. What techniques are used to reduce the model size?

I saw a talk on this paper a couple years ago. https://arxiv.org/abs/1503.02531 The method is to train a smaller model on the predictions of a large model or ensemble. I'd be interested in knowing other techniques as well.