Hacker News new | ask | show | jobs
by erichahn 1913 days ago
Isn't the point of ML exactly that you don't know the underlying distribution? How is this ever assumed in any way? ML is not parametric statistics.
2 comments

(Some) ML is non-parametric, but there are always some questions you need to be able to answer about your data. At bare minimum, is the generating process ergodic, what is the error of your measurement procedure, how representative of the true underlying distribution is your sampling procedure? All use of data should start with some exploratory analysis before you ever get to the modeling stage.

Once you have a model, at minimum understand how to tune for the tradeoffs of different types of error and don't naively optimize for pure accuracy. At the obvious extremes, if you're trying to prevent nuclear attack, false negatives are much more costly than false positives, if you're trying to figure out whether to execute someone for murder, false positives are much more costly than false negatives. Understand the relative costs of different types of error for whatever you're trying to predict and proceed accordingly.

Well, all optimization problems are equivalent to a maximum likelihood estimate for a corresponding probability distribution so you may make more implicit assumptions than you think.

Typical ML methods just have a huge distribution space that can fit almost anything from which they pick just 1 option. This has two downsides:

Since your distribution space is several times too large by design you lose the ability to say anything useful about the accuracy of your estimate, other than that it is not the only option by far.

Since you must pick 1 option from your parameter space you may miss slightly less likely explanations that may still have huge consequences, which means your models tend to end up overconfident.

I mean yes, there is parametric ML (maximum likelihood, MAP, GMMs, ...) and there is non-parametric ML (everything neural network, SVM, GBM, random forrests, ...).

I'd argue that the latter had bigger success in the past since the prior on the data distribution is usually wrong in real life. Think about a prior for image data distributions or the same in nlp. Forget about it.