Hacker News new | ask | show | jobs
by redditmigrant 3034 days ago
Thanks for the response!

> There are various feature engineering and feature extraction techniques. Filter methods, wrapper methods, and embedded methods. Principle component analysis, autoencoding, variance analysis, linear discriminant analysis, Gini index, genetic algorithms, etc -- the feature selection process will depend on the dataset, the problem domain, the analysis algorithm you ultimately use, etc.

Obviously thats a big toolbox and Im sure it takes time to develop an intuitive understanding for all these techniques. What I hope for is some sort guidebook on what to look for when I stumble across problems. So lets say you try out an algorithm and your accuracy(or whatever evaluation criteria you might have) is low. How do you figure out if thats due to the algorithm, or is it due to (or due to the lack of) feature selection?

An analogy that might be useful is, when I see my database queries are slow, I can use EXPLAIN to guide what knobs to tune. Obviously it requires understanding what indexes are, what a full table scan is etc. etc. but the EXPLAIN plan provides a guidebook of sorts.

1 comments

Every problem is different, so the only advice I can give is: research research research! Do the hard work up-front; figure out how to describe your problem in a mathematical sense, and identify the right tools to use for the shape of your input, output and problem dimensions. What's the distribution of each dimension. Are the relationships linear, nonlinear, clustered, dispersed, logarithmic, etc. Once you know those things, you're able to narrow in on the right tools and analyses to use.