| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mellavora 1476 days ago

> These techniques almost always tell the researchers things that they already know.

Yes, but sometimes in surprising ways.

I build a simple decision-tree model for a medical study, looking at outcomes for acute pneumonia. Went with a single tree over a forest because the model had to be interpretable. Statistically it was almost as good as the forest; I built it using fields with high feature importance values. Thus there is a chance that any 'improvement' by the forest was overfitting. but I digress.

The tree said that blood CO2 levels were the most important factor. The doctors weren't surprised by this (though they had some internal debate if this was more or less important than some other factors). What did surprise them was the cutoff level.

They said they would be concerned if CO2 was above 7. My model had the cutoff at 9.5. Sorry, I forget the units.

Point is, it confirmed what they knew (CO2 levels matter when assessing lung function), but still surprised them (CO2 levels have to be much higher than normal before this becomes discriminant over other factors, such as age).