Hacker News new | ask | show | jobs
by gfodor 4649 days ago
Nice post. Question for you: feature selection is certainly the most important part of ML. But yet, the focus of most ML texts is on the algorithm zoo and they gloss over feature selection. Are there any good references on the variety of techniques, with examples, of feature selection best practices?
1 comments

That's a good question. Feature selection is a large field of research and is a bit too broad for me to summarize in an abbreviated fashion. I would look into "model selection", specifically into scores of models that weigh both complexity (the number of variables) and goodness of fit. A good score to look into first is the Bayesian information criterion (BIC) which is used, for instance, in model selection in neuroscience. http://en.wikipedia.org/wiki/Bayesian_information_criterion

One thing you might want to try is cross-validation (http://en.wikipedia.org/wiki/Cross-validation_%28statistics%...). Cross-validation should help you determine if your model is overfitting, as it will perform significantly better on its training set than on the left out data.