|
|
|
|
|
by murbard2
3973 days ago
|
|
I see no mention of it, but I'd be surprised if they didn't use some form of knowledge distilling [1] (which Hinton came up with, so really no excuse), to condense a large neural network into a much smaller one. [1] http://arxiv.org/abs/1503.02531 |
|