| I was writing another comment based on that same example and his leaving-one-out calculations (at least based on what I understood). The posterior vs prior would be the extreme case of a leaving-one-out procedure - leaving the only data point out there is nothing left. The divergence between the data and the model goes down when we include information about the data in the model. That doesn't seem a controversial opinion. (That's how the blog post is introduced here: https://twitter.com/YulingYao/status/1662284440603619328) --- If the data consists of two flips they are either equal or different (the former becomes more likely as the true probability diverges from 0.5). a) If the data is the same, the posterior probability of that result is 3/4. The log score is 2 log(3/4) = -0.6 When we check the out-of-sample log score for each one based on the 2/3 posterior obtained from the other we get in each case a log score log(2/3) = -0.4 b) If the data is different, the posterior probability is still 1/2. The log score is 2 log(1/2) = 2 -0.7 = -1.4 When we check the out-of-sample log score for each one based on the 1/3 posterior for getting that result obtained from the other we get in each case a log score log(1/3) = -1.1 |