Hacker News new | ask | show | jobs
by movpasd 1038 days ago
#2 is not actually equivalent to "at least one child is a boy". It is rather equivalent to "the first child is a boy". The difference may seem trivial, but one implies the other without the converse being true. This changes the probabilities — it's not an issue with underspecification.

I think your example #1 makes it much clearer why the 1/3 arises, at least in a frequentist analysis.

I would like to offer a similar interpretation but from a Bayesian lens. The 1/3 as rises due to the artificiality of the knowledge condition. Given real-world constraints, we expect any information collected to cleave neatly between the two children in our imagined information gathering scenario. So we implicitly translate "at least one child is a boy" to "we've checked one child, it's a boy".

Consider the following related problem: I have two faucets next to each other, each has a 50% chance of dripping overnight. I leave one shared bucket under both of them. The next day, the bucket is wet. What's the odds that _both_ faucets dripped?

This setup makes the correlative nature of the information much clearer, and I think most people would be less likely to jump to 1/2 as an answer.

2 comments

The bucket formulation is very elegant.

I still feel the problem arises from English, not probability. It's clear that "we've checked one child, it's a boy" implies "at least one child is a boy." But furthermore, If someone tells me "at least one the two kids is a boy," I do not know how they arrived at that information. It could either have been through the bucket method or the knock-at-door method.

From a Bayesian perspective, we should consider both as possible with priors P and 1-P (i.e. the answer is somewhere between 1/3 and 1/2). On the other hand, from the perspective of someone taking a math test, I'd rather like the professor to tell me their own prior -- which, given they felt confident enough to put this on a test, they must believe it's basically 0 or basically 1.

Ultimately, both scenarios are describable by the same English phrase, and it feels proscriptivist to just consider one of them, even if it happens to have the least entropy in this case. There should always be the followup question asked: "_how_ did you know this?" and if it's kicked back to " because someone told me," either we need to ask how that person learned it or else bust out some priors.

Thanks for the compliment about the bucket, I was quite pleased with it :)

I do appreciate what you mean about the language issue — it's a misleading phrase that due to the context of the question encourages the listener to jump to "1/2". But it's quite a common expression in probability, and in that context the expression is unambiguous, if difficult to parse (like many things in mathematics, I suppose).

That makes sense.

I agree that it's must be a standard understanding among statisticians that one of these interpretations is implied (although maybe given what happened with the Monty Hall problem, it's not really so standard?). It's legitimately interesting that these two different interpretations result in different answers, but I feel that it is rather confusing to tell an outsider of the field that 1/3 is "the" answer and that their intuitions are wrong -- when actually it's just one conventional interpretation.

The Monty Hall problem is often understated, and for example the "intuitive" answer of 1/2 (i.e. that switching doesn't matter) can be restored if we assume the host himself didn't know where the car was and just happened to reveal another mule by chance. The assumption that the host knows where the car is is often not mentioned explicitly. Now it's just convention that in other such scenarios that there should be a similar understanding.

The way I like to think about the Monty Hall problem is by thinking of switching not as being "switch to another unspecified second door" but rather "switch to the winner among the other two doors, if any of them are winners".
The problem is ambiguous, due to under specification. That means that neither #1 nor #2 is "actually equivalent" to "at least one child is a boy," and more information is needed to construct a probability space.

#1 is "When both genders are known, and boys are preferred in the description, at least one is a boy." The preference is what makes the answer 1/3, and assuming it adds information to the problem.

#2 can be "When only one gender is known, and how we know it is uncorrelated with either possibility, at least one is a boy." But it can also be "When both genders are known, and the description reflects the probability of that gender being chosen at random from the two, at least one is a boy." In both cases, the answer is 1/2.

But being under-specified does not mean the question can't be answered, it just requires applying a reasonable assumption instead of an unreasonable one. #1 is very unreasonable since it adds information, #2 is close, but #3 is best.

And the proof is Bertrand's Box Paradox. That name does not properly refer to a probability problem, it applies to how to make this reasonable assumption.

"Mr. Jones has exactly two children. I have written the gender, of at least one, inside this sealed envelope. What is the probability that both children have that gender?"

If you were to open the envelope, and see the word "boy," the problem becomes the same as the one under discussion. If it can be answered, that answer is correct here as well. But it is an equivalent problem if you see the word "girl," and again the answer must be the same. If 1/3 is an acceptable answer, it means that 1/3 of all two-child families have two of the same gender, and 2/3 have mixed genders.

But that is a contradiction. We know that the split is 1/2:1/2. So the assumption, that 1/3 is a reasonable answer, is disproven. Now, that does not mean that the information came to us via #2 or #3, it just means we can't assume that it was #1.

Most often, the same logic is used for the Monty Hall Problem, it is just applied backwards.