| Feature importance is not quite the same as interpretability. Random Forests can give feature importance, but that does not account for interactions between features. So, in the end, you don't know how a model made a decision (it could be because there is a feature with high importance, but it could also be because there is an informative interaction between lower importance features). If you want to compare deep learning with linear models, you should leave image data out of it. Compare them on structured data and bag of words. MLP's and boosted decision trees, in my experience, definitely beat decision tree and linear models, on structured data. But they lack longterm robustness (complex forecasting models need constant retraining, which can hamper their adoption by business units) and don't pass regulation (it is not enough to say "has_asthma" is a high-importance feature). In finance and health care, interpretability is enormously valued. It is a constant trade-off between accuracy and interpretability. A long time ago, Caruana made hospital triage models, with neural networks being the clear winner in generalization performance. Instead, they opted for a simple logistic regression when productionizing. Why? > [...] patients with pneumonia who have a history of asthma have lower risk of dying from pneumonia than the general population. Needless to say, this rule is counterintuitive. But it reflected a true pattern in the training data: patients with a history of asthma who presented with pneumonia usually were admitted not only to the hospital but directly to the ICU (Intensive Care Unit). The good news is that the aggressive care received by asthmatic pneumonia patients was so effective that it lowered their risk of dying from pneumonia compared to the general population. The bad news is that because the prognosis for these patients is better than average, models trained on the data incorrectly learn that asthma lowers risk, when in fact asthmatics have much higher risk (if not hospitalized). http://people.dbmi.columbia.edu/noemie/papers/15kdd.pdf Though there is nothing holding you back from using both simple linear, and complex non-linear models at the same time: Only when the models severely disagree do you pick the interpretable model. Or use the linear model to find data issues, like those mentioned above, that are tremendously obscured (if not impossible to identify) when only using deep learning in a train-test framework. |
So what they wanted to know is the POD|"No hospital" but they clearly collected data about POD|"Hospital" (since it included ICU admission, etc).
The problem is they measured the wrong thing and then misinterpreted their results. Worse, it looks like the study was designed to be this way!