|
As currymj commented, this isn't accurate for ML, only for classical statistics. In ML (or more specifically deep learning), we make no distribution-based assumptions, other than the fundamental assumption that our training data is "distributed like" our test data. Thus, there aren't issues with fat-tailed distributions since we make no such normality assumptions. Indeed, with the use of autoencoders, we don't assume a single distribution, but rather a stochastic process. I suppose you could say statistics is less "empirical" than ML in the sense that it is axiom-based, whether that is a normality assumption of predictions about a regression line or stock prices following a Wiener process. By contrast, ML is less rationalist by simply reflecting data. |
The two fat tail questions one has to engage are:
- is it possible that a catastrophic input might be lurking in the wild that would not be present in a typical training set? Even with a 1M instance training set, a one-in-a-million situation will only appear (and affect your objective function) on average one time, and could very well not appear at all.
- can I bound how badly I will suffer if my system is allowed to operate in the wild on such an input?
DL gives no additional tools to engage these questions.