| I am increasingly worried with people applying ML in everything without any rigour. Statical inference generally only works well in very specific conditions: 1 - You know the distribution of the phenomenon under study (or make an explicit assumption and assume the risk of being wrong) 2 - Using (1), you calculate how much data you need so you get an estimation error below x% Even though most ML models are essentially statistics and have all the same limitations (issues with convergence, fat tailed distributions, etc...) it seems the industry standard is to pretend none of that exists and hope for the best. IMO the best moneymaking opportunities in the decade will involve exploiting unsecured IOT devices and naive ML models, we will have plenty of those. |
In ML (or more specifically deep learning), we make no distribution-based assumptions, other than the fundamental assumption that our training data is "distributed like" our test data. Thus, there aren't issues with fat-tailed distributions since we make no such normality assumptions. Indeed, with the use of autoencoders, we don't assume a single distribution, but rather a stochastic process.
I suppose you could say statistics is less "empirical" than ML in the sense that it is axiom-based, whether that is a normality assumption of predictions about a regression line or stock prices following a Wiener process. By contrast, ML is less rationalist by simply reflecting data.