|
|
|
|
|
by contravariant
1906 days ago
|
|
Well, all optimization problems are equivalent to a maximum likelihood estimate for a corresponding probability distribution so you may make more implicit assumptions than you think. Typical ML methods just have a huge distribution space that can fit almost anything from which they pick just 1 option. This has two downsides: Since your distribution space is several times too large by design you lose the ability to say anything useful about the accuracy of your estimate, other than that it is not the only option by far. Since you must pick 1 option from your parameter space you may miss slightly less likely explanations that may still have huge consequences, which means your models tend to end up overconfident. |
|
I'd argue that the latter had bigger success in the past since the prior on the data distribution is usually wrong in real life. Think about a prior for image data distributions or the same in nlp. Forget about it.