Hacker News new | ask | show | jobs
by b_tterc_p 2710 days ago
PCA has specific use cases. It’s not a catch all dimensionality reduction technique. You can’t use it effectively, for example, if things are not linearly correlated. There of course many tools for addressing many problems, but as the title states, this is often a grind. For any practical problem, exclusive of huge black box neural nets where you don’t need to understand the model, you are probably better off starting with a smaller set of reasonable sounding features and then slowly growing out your model to incorporate others.

Also if you meant random forest by forests... those aren’t especially reproducible. Understanding what’s going on is not always easy, and most people seem to misinterpret the idea of “variable importance” when you have a mix of categorical and numeric features. Decision trees and linear regressions are nice and reproducible.