Hacker News new | ask | show | jobs
by cweill 1390 days ago
Yeah, tree based models are great for tabular datasets that are primarily numeric, with only a few categorical variables. But as soon as you categorical variables have a 1000+ potential values that need 1-hot encoding or if you have any natural language text associated with your rows, deep learning almost always outperforms, especially if you have over 50K instances in my experience.

The major downside of DL is the slow training, and therefore slow iteration feedback loop. Couple that with an exponentially growing number or hparams to tune, and you get something very powerful but costly in terms of time to use.

But if you want the best possible accuracy, and data collection isn't expensive, DL is the way to go. Just expect to spend 10x the amount of time tuning it vs trees to get a 10% to 20% reduction in error.