Hacker News new | ask | show | jobs
by fractionalhare 1902 days ago
> In ML (or more specifically deep learning), we make no distribution-based assumptions, other than the fundamental assumption that our training data is "distributed like" our test data.

Okay, so that's about the same as classical statistics. You're just waiving the requirement to know what the distribution is. You are still assuming there exists a distribution and that it holds in the future when you apply the model. Sure you may not be trying to estimate parameters of a distribution, but it is still there and all standard statistical caveats still apply.

> Indeed, with the use of autoencoders, we don't assume a single distribution, but rather a stochastic process.

Classical statistics frequently makes use of multiple distrutions and stochastic processes.

1 comments

Of course there's a distribution behind the data. The parent commenter was saying not all machine learning techniques need to know that distribution, as a refute to their parent comment.
I know what they're saying, I even reiterate it in my second sentence. My point is that doesn't protect you from the distribution changing, which is a problem that applies to machine learning and classical statistics.

This is in support of the GP comment: while you can loosen your assumptions about what the underlying distribution is and don't literally need to know it, you can't get away from the fundamental limitations of statistics. Which is the original topic we're talking about.