| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ellisv 680 days ago

> Yet every time I have compared the two they give similar selections — so how can one be “evil” and the other so glorified?

Frequentist and Bayesian approaches often yield similar results but philosophically are different. In general I favor and recommend lasso because I see it perform as well or better than stepwise at variable selection but doesn't come with all the baggage.

Lasso avoid the multiple comparison problem by applying a regularization penalty instead of sequentially fitting multiple models and performing hypothesis testing. This also helps to prevent overfitting. If you want to see which variables would be included/excluded you can turn the regularization up or down (it is pretty easy to spit out an automated report).

Stepwise selection comes in different flavors: forward, backwards, or bidirectional; R-squared, adjusted R-squared, AIC, BIC, etc.; these often all lead to different models so the choices must be justified and I rarely see any defense for them.

Of course, if the point is prediction over coefficient estimation and interpretability then neither of these are great choices.