Hacker News new | ask | show | jobs
by perturbation 2908 days ago
I overall agree that ML is needed over 'just SQL' in a lot of cases (though SQL + good visualizations / exploratory analysis can answer a lot of those questions qualitatively). I would also be careful with the linear model approach. Multicollinearity can hide how important a feature is (or reverse sign of a feature) when trying to use coefficients to interpret importance, so using a linear model like that isn't as straightforward as it seems.

As a workaround, you could look for high VIF to detection multicollinearity, use some sort of stepwise selection / penalized regression, or use something like relaimpo (https://cran.r-project.org/web/packages/relaimpo/index.html) - not sure of a Python equivalent - to judge overall feature importance in the model.