| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kvathupo 1907 days ago

As currymj commented, this isn't accurate for ML, only for classical statistics.

In ML (or more specifically deep learning), we make no distribution-based assumptions, other than the fundamental assumption that our training data is "distributed like" our test data. Thus, there aren't issues with fat-tailed distributions since we make no such normality assumptions. Indeed, with the use of autoencoders, we don't assume a single distribution, but rather a stochastic process.

I suppose you could say statistics is less "empirical" than ML in the sense that it is axiom-based, whether that is a normality assumption of predictions about a regression line or stock prices following a Wiener process. By contrast, ML is less rationalist by simply reflecting data.

6 comments

_dps 1907 days ago

It is absolutely untrue that DL is immune to fat-fail problems, and it is important that no one operate mission critical systems under this assumption.

The two fat tail questions one has to engage are:

- is it possible that a catastrophic input might be lurking in the wild that would not be present in a typical training set? Even with a 1M instance training set, a one-in-a-million situation will only appear (and affect your objective function) on average one time, and could very well not appear at all.

- can I bound how badly I will suffer if my system is allowed to operate in the wild on such an input?

DL gives no additional tools to engage these questions.

link

godelski 1907 days ago

> It is absolutely untrue that DL is immune to fat-fail problems

In fact, working on fat tail problems is currently a hot topic in ML.

link

kvathupo 1907 days ago

I don't quite follow: is not what you described a flaw fundamental to all forecasting; that is, the occurrence of a gross outlier? I should clarify that DL doesn't suffer from the same problem the normality condition has on fat-tails: a failure to capture the skew of the distribution.

link

_dps 1907 days ago

It's not characteristic of all forecasting, only purely empirical forecasting.

Definitionally, the only way to reason about risk that doesn't appear in training data is non-empirical (e.g. a priori assumptions about distributions, or worst cases, or out-of-paradigm tools like refusing to provide predictions for highly non-central inputs).

DL is not any better (or worse) than any other purely empirical method at answering questions about fat-tail risk, and the only way to do better is to use non-empirical/a-priori tools. Obviously the tradeoff here is that your a priori assumptions can be wrong, and that too needs to be included in your risk model (see e.g. Robust Optimization / Robust Control).

link

sjburt 1907 days ago

I think it's wrong to assume that non-empirical methods can be reliably trusted to give better results. Humans are terrible at avoiding bias or evaluating risks, especially for uncommon events.

link

carlosf 1907 days ago

Food for thought: if every method for predicting event x is terrible, then you might as well not try to predict x and build your life in such way that you never expose yourself to the risk of x happening.

link

kragen 1907 days ago

From a Bayesian point of view, that amounts to a "prediction" that the probability of event x is so significant that you should build your life around it. But I guess if you knew enough for that sentence to make sense you wouldn't have posted your comment. So, suffice it to say that Bayesian decision theory cuts the knot you're talking about.

link

mochomocha 1907 days ago

I agree that ML tends to put weaker assumptions on the data than classical statistics and that it's a good thing.

However most ML certainly makes distributional assumptions - they are just weaker. When you're learning a huge deep net with an L2 loss on a regression task, you have a parametric conditional gaussian distribution under the hood. It's not because it's overparametrized that there's no distributional assumption. Vanilla autoencoders are also working under a multivariate gaussian setup as well. Most classifiers are trained under a multinomial distribution assumption etc.

And fat-tailed distributions are definitely a thing. It's just less of a concern for the mainstream CV problems on which people apply DL.

link

fractionalhare 1907 days ago

> In ML (or more specifically deep learning), we make no distribution-based assumptions, other than the fundamental assumption that our training data is "distributed like" our test data.

Okay, so that's about the same as classical statistics. You're just waiving the requirement to know what the distribution is. You are still assuming there exists a distribution and that it holds in the future when you apply the model. Sure you may not be trying to estimate parameters of a distribution, but it is still there and all standard statistical caveats still apply.

> Indeed, with the use of autoencoders, we don't assume a single distribution, but rather a stochastic process.

Classical statistics frequently makes use of multiple distrutions and stochastic processes.

link

potatoman22 1907 days ago

Of course there's a distribution behind the data. The parent commenter was saying not all machine learning techniques need to know that distribution, as a refute to their parent comment.

link

fractionalhare 1907 days ago

I know what they're saying, I even reiterate it in my second sentence. My point is that doesn't protect you from the distribution changing, which is a problem that applies to machine learning and classical statistics.

This is in support of the GP comment: while you can loosen your assumptions about what the underlying distribution is and don't literally need to know it, you can't get away from the fundamental limitations of statistics. Which is the original topic we're talking about.

link

peytn 1907 days ago

I dunno, there are definitely distribution-based assumptions—good luck working with skewed data. Most old-school techniques are kinda additive, so nobody's really been assuming a single distribution for practical applications.

Current ML techniques just work well for the kinds of problems people are applying them to, which is kind of a tautology. We should definitely seek to understand the theory behind stuff like dropout and not consider our lack of understanding a strength.

link

dumb1224 1907 days ago

> I suppose you could say statistics is less "empirical" than ML in the sense that it is axiom-based, whether that is a normality assumption of predictions about a regression line or stock prices following a Wiener process. By contrast, ML is less rationalist by simply reflecting data.

I don't think that's true (or maybe I misunderstood?), I guess your comment "simply reflecting data" means fitting data with a very flexible function (curve)? There are very flexible distributions to fit almost any kind of data e.g https://en.wikipedia.org/wiki/Gamma_distribution or with a composition of them, but as a practitioner you still need to interpret the model and check if it does represent the underlying process well. Both statistical inference and ML are getting there using different methods.

link

clircle 1907 days ago

The only reason that this may not be accurate for ML is because machine learners generally make no attempt to quantify their uncertainty in their predictions with e.g. confidence intervals or prediction intervals.

And there is a whole field of non-parametric statistics that doesn't make distribution assumptions.

link