Hacker News new | ask | show | jobs
by murbard2 3973 days ago
I see no mention of it, but I'd be surprised if they didn't use some form of knowledge distilling [1] (which Hinton came up with, so really no excuse), to condense a large neural network into a much smaller one.

[1] http://arxiv.org/abs/1503.02531