|
You can also think about Bayes' theorem as follows. Suppose we have a logical robot trying to learn about the world. The robot has a collection of hypotheses in its brain. Every time it observes a new fact, it deletes all hypotheses that are incompatible with that fact. For example, suppose it is thinking about the hair colour and eye colour of Joe. It starts with these hypotheses about Joe's (eye colour, hair colour): (eye colour, hair colour)
=========================
(blue, blond)
(blue, black)
(brown, blond)
(brown, black)
Suppose that it learns that blue eyed people have blond hair. It deletes hypothesis (blue, black) incompatible with it, and keeps only the hypotheses compatible with it: (blue, blond)
(brown, blond)
(brown, black)
Suppose it now learns that Joe has blue eyes. It keeps only the hypothesis compatible with it: (blue, blond)
So it has now learned the hair colour.In reality it is not true that all blue eyed people have blond hair. We change the robot's brain and give a weight to each hypothesis indicating how likely it is. Equivalently, we could insert multiple copies of each hypothesis, and the likelihood of a hypothesis is equal to the number of copies of the hypothesis. (blue, blond): 10
(blue, black): 2
(brown, blond): 9
(brown, black): 8
Blue eyed people are more likely to be blond. Those are our hypotheses about the attributes of Joe. Suppose we now learn that Joe has blue eyes. It keeps only the hypotheses compatible with it: (blue, blond): 10
(blue, black): 2
So P(blond hair) = 10/12 and P(black hair) = 2/12. This is all Bayes' theorem is: you have a set of weighted hypotheses, and you delete hypotheses incompatible with the observed evidence. The extra factor in Bayes' theorem is only there to re-normalise the weights so that they sum to 1. |