|
|
|
|
|
by radford-neal
1125 days ago
|
|
As the author admits at the end, this is rather misleading. In normal usage, "overfit" is by definition a bad thing (it wouldn't be "over" if it was good). And the argument given does nothing to show that Bayesian inference is doing anything bad. To take a trivial example, suppose you have a uniform(0,1) prior for the probability of a coin landing heads. Integrating over this gives a probability for heads of 1/2. You flip the coin once, and it lands heads. If you integrate over the posterior given this observation, you'll find that the probability of the value in the observation, which is heads, is now 2/3, greater than it was under the prior. And that's OVERFITTING, according to the definition in the blog post. Not according to any sensible definition, however. |
|
The posterior vs prior would be the extreme case of a leaving-one-out procedure - leaving the only data point out there is nothing left.
The divergence between the data and the model goes down when we include information about the data in the model. That doesn't seem a controversial opinion. (That's how the blog post is introduced here: https://twitter.com/YulingYao/status/1662284440603619328)
---
If the data consists of two flips they are either equal or different (the former becomes more likely as the true probability diverges from 0.5).
a) If the data is the same, the posterior probability of that result is 3/4. The log score is 2 log(3/4) = -0.6
When we check the out-of-sample log score for each one based on the 2/3 posterior obtained from the other we get in each case a log score log(2/3) = -0.4
b) If the data is different, the posterior probability is still 1/2. The log score is 2 log(1/2) = 2 -0.7 = -1.4
When we check the out-of-sample log score for each one based on the 1/3 posterior for getting that result obtained from the other we get in each case a log score log(1/3) = -1.1