Hacker News new | ask | show | jobs
by blueblob 3300 days ago
Define "meaningfully better." Perhaps you mean statistically significantly better? It may have better accuracy, but it has significantly less interpretability. What does it capture that regression couldn't capture? At least with regression you can interpret the relationship between all of the variables and their relative importance by looking at the coefficients of the regression. With deep learning, the best approaches for explanation are to train another model at the same time that you use for explanation. Additionally, it was proven that a perceptron can learn any function, so in some senses the "deep" part of deep learning is because people are being lazy because at least you could get a better interpretation of the perceptron. I don't mean to imply that there's not a place for deep learning, but I think this isn't a great refutation of the argument that fitting a deep model is somewhat inappropriate for a small dataset.
2 comments

The model we are comparing against makes 10X as many errors.

I hadn't imagined someone would argue that's not a meaningful difference.

Though the difference is statistically significant too.

Not sure what kind of argument that is. If something overfits it will have less error, does that make it better? It may mean it would generalize a lot less when run on more data. Whether or not something is meaningful depends on what you take the meaning to be.
Not the OP, but it wanted to point out it has 10X less error on the holdout sample so it is not simply overfitting.
It doesn't matter that it's on the holdout, he's partitioning an already small dataset into 5 partitions and talking about the accuracy in using 80 points to predict 20 points. The whole argument is usually that in the law of large numbers you can now have a statistically significant difference in accuracy. When you're predicting 20 points each with 5 (potentially different) models you likely don't have enough to talk about statistical significance.
We tried to mirror the original analysis as closely as possible - we did 5-fold cross validation but used the standard MNIST test set for evaluation (about 2,000 validation samples for 0s and 1s). We split the test set into 2 pieces. The first half was used to assess convergence of the training procedure while the second half was used to measure out of sample predictive accuracy.

Predictive accuracy is measured on 1000 samples, not 20.

Honest question.. Who cares about interpretability if you're optimizing for predictive power?

Also, DL can be interpretable in different domains, much like any non-linear classifier (are you hating on random forests too for the same reason?) It just takes more work vs. looking at linear coefficients.

This is an area that fades in and out of focus with such venues as the Workshop on Human Interpretability in Machine Learning (WHI) [1]. It's becoming increasingly important when it comes to auditability and understanding of what is actually learned by algorithm. Avoiding classifiers from learning to discriminate based on age, race, etc [2] or in domains where it's important to know what the algorithm is doing such as medicine. Work in understanding DL is not really interpretable in any domain, typically they train another (simpler, less accurate) model and use that to explain what the model is doing or use perturbation analysis to try to tease out what it is learning. If all you care about is getting the right answer and not why you get that answer maybe it doesn't matter.

I wouldn't say I'm hating on DL nor that I hate on random forests, or ensembles, etc., but when you have very little data fitting an uninterpretable, high dimensional model might not be the right answer, in my opinion, see [3].

[1] https://arxiv.org/html/1607.02531v2 [2] https://arxiv.org/abs/1606.08813 [3] https://arxiv.org/abs/1601.04650