|
|
|
|
|
by jonathan_landy
633 days ago
|
|
>> Some of the problems don’t matter as much if your goal for the model is just prediction, not interpretation of the model and its coefficients. But most of the time that I see the method used (including recent examples being distributed by so-called experts as part of their online teaching), the end model is indeed used for interpretation, and I have no doubt this is also the case with much published science. Further, even when the goal is only prediction, there are better methods like the Lasso, of dealing with a problem of a high number of variables. I use this method often for prediction applications. First, it’s a sort of hyper parameter selection, so you should obviously use a holdout and test set to help you make a good choice. Second, I often see the method dogmatically shut down like this, in favor of lasso. Yet every time I have compared the two they give similar selections — so how can one be “evil” and the other so glorified? I prefer the stepwise method though as you can visualize the benefit of adding in each additional feature. That can help to guide further feature development — a point that I’ve seen significantly lift the bottom line of enterprise scale companies. |
|
Frequentist and Bayesian approaches often yield similar results but philosophically are different. In general I favor and recommend lasso because I see it perform as well or better than stepwise at variable selection but doesn't come with all the baggage.
Lasso avoid the multiple comparison problem by applying a regularization penalty instead of sequentially fitting multiple models and performing hypothesis testing. This also helps to prevent overfitting. If you want to see which variables would be included/excluded you can turn the regularization up or down (it is pretty easy to spit out an automated report).
Stepwise selection comes in different flavors: forward, backwards, or bidirectional; R-squared, adjusted R-squared, AIC, BIC, etc.; these often all lead to different models so the choices must be justified and I rarely see any defense for them.
Of course, if the point is prediction over coefficient estimation and interpretability then neither of these are great choices.