Hacker News new | ask | show | jobs
by baxtr 3134 days ago
I regularly forget how Bayes works. Everytime that happens I browse up to that page: https://www.bayestheorem.net/

I love the way it’s explained there.

3 comments

You can also think about Bayes' theorem as follows. Suppose we have a logical robot trying to learn about the world. The robot has a collection of hypotheses in its brain. Every time it observes a new fact, it deletes all hypotheses that are incompatible with that fact.

For example, suppose it is thinking about the hair colour and eye colour of Joe. It starts with these hypotheses about Joe's (eye colour, hair colour):

    (eye colour, hair colour)
    =========================
    (blue, blond)
    (blue, black)
    (brown, blond)
    (brown, black)
Suppose that it learns that blue eyed people have blond hair. It deletes hypothesis (blue, black) incompatible with it, and keeps only the hypotheses compatible with it:

    (blue, blond)
    (brown, blond)
    (brown, black)
Suppose it now learns that Joe has blue eyes. It keeps only the hypothesis compatible with it:

    (blue, blond)
So it has now learned the hair colour.

In reality it is not true that all blue eyed people have blond hair. We change the robot's brain and give a weight to each hypothesis indicating how likely it is. Equivalently, we could insert multiple copies of each hypothesis, and the likelihood of a hypothesis is equal to the number of copies of the hypothesis.

    (blue, blond):  10
    (blue, black):  2
    (brown, blond): 9
    (brown, black): 8
Blue eyed people are more likely to be blond. Those are our hypotheses about the attributes of Joe. Suppose we now learn that Joe has blue eyes. It keeps only the hypotheses compatible with it:

    (blue, blond):  10
    (blue, black):  2
So P(blond hair) = 10/12 and P(black hair) = 2/12. This is all Bayes' theorem is: you have a set of weighted hypotheses, and you delete hypotheses incompatible with the observed evidence. The extra factor in Bayes' theorem is only there to re-normalise the weights so that they sum to 1.
How Bayes kinda works, or how I see it.

Conditional probability (with some caveats that someone in the comments can fill in on):

    P(a,b) = P(b,a)
    P(a|b) * P(b) = P(b|a) * P(a)
    P(a|b) = P(b|a) * P(a) / P(b)
a can be model and b can be data so it becomes

    P(model | data) =
    P(data | model) * P(model) / P(data)
We have or can estimate the things on the right side. We want to ultimately get the thing on the left side.
To clear up your first set to have conditional probabilities for everything, Bayes' theorem is just a restatement of the product rule:

    p(a and b | context c) = p(a|b,c) * p(b|c)
                           = p(b|a,c) * p(a|c)
    or = p(a|c)*p(b|c) = p(b|c)*p(a|c) if a and b are independent of each other

    so Bayes only matters when there is dependence:
    p(a|b,c) = p(a|c) * p(b|a,c) / p(b|c)

    otherwise it's just p(a|c) = p(a|c)
I like to put things in that order because p(a|c) is the "prior belief" and with some handwaving say things like "updated belief = prior belief and new evidence about belief".
Mathematically trivial, but great notation to explain the point of Bayes. Brilliant!
Excellent! Thanks for sharing.