Hacker News new | ask | show | jobs
by benhamner 4215 days ago
This is consistent with our experience running hundreds of Kaggle competitions: for most classification problems, some variation on ensembled decision trees (random forests, gradient boosted machines, etc.) performs the best. This is typically in conjunction with clever data processing, feature selection, and internal validation.

One key exception is where the data is richly and hierarchically structured. Text, speech, and visual data falls under this category. In many cases here, variations of neural networks (deep neural nets/CNN's/RNN's/etc.) provide very dramatic improvements.

This study does have a couple limitations. The datasets used are very small & form a very biased selection of real-world applications of machine learning. It doesn't consider ensembles of different model types (which I'd expect to provide a consistent but marginal improvement over the results here).