Hacker News new | ask | show | jobs
by DebtDeflation 2672 days ago
It's another example of the FAANG + Bay Area Startups world versus the other 99% of Corporate America. In the latter world, most of the "machine learning" in production is traditional stuff like Random Forest, SVM, and more recently Gradient Boosting. Hell, Marketing departments across the country are still running old school decision tree (CART and CHAID) models and logistic regression models written in SAS 20+ years ago. DL/NN is a minuscule proportion of production ML in the enterprise space.
1 comments

I think there is good reason that "old" machine learning models are more popular than DNN in the enerprise space. Most of the data is in the tabular format. What is more, "old" and simple decision tree or linear model are very easy to understand, deploy and are fast. There is for sure clear advantage of having even simple decision tree implemented in the system than making decisions at random.
The main reason though is that these other methods outperform neural nets in tons of different situations. Even just from an accuracy / business success metric point of view, many problems are just better solved with other classes of models, domain-specific feature engineering, etc. It will probably remain so for many decades at least.
DNN's make good features though, especially if you have time series data or lots of text.

I agree that the final model should be a randomforest/xgboost/lightgbm for typical tabular data.

I meant that extracting an intermediate layer as a feature embedding and then sticking a classical model on top of it performs worse than curating features through domain-specific expert tuning, for a ton of diverse application domains.