Hacker News new | ask | show | jobs
by contravariant 1906 days ago
Well, all optimization problems are equivalent to a maximum likelihood estimate for a corresponding probability distribution so you may make more implicit assumptions than you think.

Typical ML methods just have a huge distribution space that can fit almost anything from which they pick just 1 option. This has two downsides:

Since your distribution space is several times too large by design you lose the ability to say anything useful about the accuracy of your estimate, other than that it is not the only option by far.

Since you must pick 1 option from your parameter space you may miss slightly less likely explanations that may still have huge consequences, which means your models tend to end up overconfident.

1 comments

I mean yes, there is parametric ML (maximum likelihood, MAP, GMMs, ...) and there is non-parametric ML (everything neural network, SVM, GBM, random forrests, ...).

I'd argue that the latter had bigger success in the past since the prior on the data distribution is usually wrong in real life. Think about a prior for image data distributions or the same in nlp. Forget about it.