Hacker News new | ask | show | jobs
by l_e_o_n 849 days ago
You flip a possibly-biased coin 20 times and get half heads, half tails, e.g. "THHHTTTTTHTHTTHHHTHH".

Under the model where the bias is 0.5—a fair coin—the probability of that sequence is (0.5)^20 or about one in a million. In fact, the probability of any sequence you could observe is one in a million.

Under the model where the bias is 0.4 the probability is (0.4)^10 × (0.6)^10 or about one in two million.

That is, the sequence we observed supplies about twice as much evidence in favor of bias = 0.5 as compared with bias = 0.4—this is likelihood.

Likelihood ratios are all that matter.

Morals:

- The more complex the event you're predicting (the rarer the tyical observed result) the smaller the associated likelihoods will tend to be

- It's possible that every observed result has a tiny probability under every model you're considering

- Nonetheless it makes sense to use the ratios of these numbers to compare the models

- This has nothing to do with probability densities or logarithms, though the fact that we often work with densities also makes absolute likelihood values relative to the choice of units

Added in edits:

- You could summarize the sequence with the number of heads or tails and then the likelihood values would be larger but the ratios would remain the same (it's a sufficient statistic). Similarly in the CrossValidated question one could summarize the data with the mean and sum of squares. But this doesn't work in general, e.g. if we have i.i.d. draws from a Cauchy distribution.

3 comments

You have given a nice clean answer that does not make any errors (such as talking about the likelihood as a density in parameters, which of course it is not). Thanks for writing it down.

The only other thing worth adding to what you have written is that the likelihood is a product of N factors.

As such, it will essentially always diverge toward infinity (if the density factors are on average greater than 1) or collapse fast towards zero (if the factors are on average less than 1, as in your example and in OP).

So this very structure (arising from the IID observations) implies that no “stable” density will pop out. It’ll always blow up or down!

One way to stabilize things is to take (1/N) times the log of the likelihood. Then you will indeed converge to something familiar - the entropy, - E log p(x).

Not quite; the probability of n/2 successes in n trials is given as Binomial(n,p) not p^n. p^n is correct for a single sequence but there are many possible sequences that result half heads, half tails and so you have a factor of "N choose X" or the so called "Binomial Coefficient".

> (0.4)^20 × (0.6)^20

and I think you mean (0.4)^10 × (0.6)^10 or more generally p^x*(1-p)^n-x.

I'm talking about the whole sequence; you're talking about the number of heads (or) tails in the sequence.

The number of heads is a sufficient statistic, so we'll get the same likelihood ratios out, but the likelihood values themselves will be larger.

You could make a similar point about the original CrossValidated Normal(0, 1)^N example by summarizing the data with the mean and sum of squares.

This doesn't work if the data were Cauchy(0, 1)^N instead.

> half heads, half tails.

> Under the model where the bias is 0.5—a fair coin—the probability of that outcome is (0.5)^20 or about one in a million.

No.

Edit: someone downvoted, ha. It's closer to 1 in 6.

1 in a million is the probability of correctly predicting a unique sequence of 20 coin flips, in the exact order. (E.g. first 10 flips heads, 2nd 10 flips tails, in that order - 1 in a million)
I'm surprised people are conflating the Binomial distribution with OP's statement. He is talking about one specific outcome of half heads/half tails (where order matters). There is exactly one way to get that outcome.
You only knew that because you know how to do the math yourself, friend.

If you read what he said (I quoted it), he was not talking about one specific outcome. He didn't...well, specify that. He said half and half.

I also know how to do math, as I think I proved in my own comment. So I don't accept your insult.

Edited for clarity.