| You flip a possibly-biased coin 20 times and get half heads, half tails, e.g. "THHHTTTTTHTHTTHHHTHH". Under the model where the bias is 0.5—a fair coin—the probability of that sequence is (0.5)^20 or about one in a million. In fact, the probability of any sequence you could observe is one in a million. Under the model where the bias is 0.4 the probability is (0.4)^10 × (0.6)^10 or about one in two million. That is, the sequence we observed supplies about twice as much evidence in favor of bias = 0.5 as compared with bias = 0.4—this is likelihood. Likelihood ratios are all that matter. Morals: - The more complex the event you're predicting (the rarer the tyical observed result) the smaller the associated likelihoods will tend to be - It's possible that every observed result has a tiny probability under every model you're considering - Nonetheless it makes sense to use the ratios of these numbers to compare the models - This has nothing to do with probability densities or logarithms, though the fact that we often work with densities also makes absolute likelihood values relative to the choice of units Added in edits: - You could summarize the sequence with the number of heads or tails and then the likelihood values would be larger but the ratios would remain the same (it's a sufficient statistic). Similarly in the CrossValidated question one could summarize the data with the mean and sum of squares. But this doesn't work in general, e.g. if we have i.i.d. draws from a Cauchy distribution. |
The only other thing worth adding to what you have written is that the likelihood is a product of N factors.
As such, it will essentially always diverge toward infinity (if the density factors are on average greater than 1) or collapse fast towards zero (if the factors are on average less than 1, as in your example and in OP).
So this very structure (arising from the IID observations) implies that no “stable” density will pop out. It’ll always blow up or down!
One way to stabilize things is to take (1/N) times the log of the likelihood. Then you will indeed converge to something familiar - the entropy, - E log p(x).