|
|
|
|
|
by therajiv
3261 days ago
|
|
The author discusses how linear models are generally more interpretable than deep learning methods, but I'd argue that's actually changing pretty quickly. Especially for large image/sequence inputs (which covers most of the applications that are getting hyped up), linear regressions don't perform very well, and often that performance difference prevents them from picking out important features. Given that fast, scalable methods for feature importance are on the rise (e.g. https://arxiv.org/abs/1704.02685, which the author mentions), you often get equally interpretable feature scores from deep models that are more accurate than analogous ones from linear models. Basically, my point is that model interpretation strongly depends on how accurate your model is, and because deep learning models are so much better than linear models for some tasks, it makes sense to use them - even if your primary goal is interpretability. That said, I do believe that if you ever care at all about interpretation, you should almost never be using multilayer perceptrons (which have recently become part of the widening umbrella term "deep learning"), because they rarely work better than decision tree models or basic linear models (and MLPs are generally less or equally as interpretable when compared to traditional methods). |
|
Random Forests can give feature importance, but that does not account for interactions between features. So, in the end, you don't know how a model made a decision (it could be because there is a feature with high importance, but it could also be because there is an informative interaction between lower importance features).
If you want to compare deep learning with linear models, you should leave image data out of it. Compare them on structured data and bag of words.
MLP's and boosted decision trees, in my experience, definitely beat decision tree and linear models, on structured data. But they lack longterm robustness (complex forecasting models need constant retraining, which can hamper their adoption by business units) and don't pass regulation (it is not enough to say "has_asthma" is a high-importance feature).
In finance and health care, interpretability is enormously valued. It is a constant trade-off between accuracy and interpretability.
A long time ago, Caruana made hospital triage models, with neural networks being the clear winner in generalization performance. Instead, they opted for a simple logistic regression when productionizing. Why?
> [...] patients with pneumonia who have a history of asthma have lower risk of dying from pneumonia than the general population. Needless to say, this rule is counterintuitive. But it reflected a true pattern in the training data: patients with a history of asthma who presented with pneumonia usually were admitted not only to the hospital but directly to the ICU (Intensive Care Unit). The good news is that the aggressive care received by asthmatic pneumonia patients was so effective that it lowered their risk of dying from pneumonia compared to the general population. The bad news is that because the prognosis for these patients is better than average, models trained on the data incorrectly learn that asthma lowers risk, when in fact asthmatics have much higher risk (if not hospitalized).
http://people.dbmi.columbia.edu/noemie/papers/15kdd.pdf
Though there is nothing holding you back from using both simple linear, and complex non-linear models at the same time: Only when the models severely disagree do you pick the interpretable model. Or use the linear model to find data issues, like those mentioned above, that are tremendously obscured (if not impossible to identify) when only using deep learning in a train-test framework.