Hacker News new | ask | show | jobs
by jmmcd 3326 days ago
The point about infinite training data is potentially useful. The other one I still don't agree with. Your goal is only to understand the NN insofar as it models the original data. Any errors the NN is making are not worth learning about. So it would be better to train the understandable method (DT) on the original data.
1 comments

>Any errors the NN is making are not worth learning about.

But that's the whole point of this method! To understand what errors the NN might be making. It's also quite possible the NN's errors aren't really errors, if there are mistakes or noise in the labels.

This technique has been called "dark knowledge" and is really interesting. See http://www.kdnuggets.com/2015/05/dark-knowledge-neural-netwo... They train much simpler models to get the same accuracy as much bigger models, just by copying the predictions of the bigger model on the same data. In fact you can get crazy results like this:

>When they omitted all examples of the digit 3 during the transfer training, the distilled net gets 98.6% of the test 3s correct even though 3 is a mythical digit it has never seen.

Ah, very interesting! I agree that would be useful. But I think this thread has ended up with a proposal very different from the one I started replying to.