Hacker News new | ask | show | jobs
by nstbayless 1038 days ago
Probability does not bite; describing partial information in English bites.

It's not actually true that the probability is 1/3, nor that the probability is 1/2. (Same with 13/27 vs 1/2). The problem is underspecified. Here's two different more specified versions for which the answer is clear:

1. Sample from all two-child families with at least one boy. What portion of these families have two boys? (answer, rot13: n guveq)

2. Choose a random two-child family, then knock on their door. A boy answers. What are the odds the other child is a boy? (rot13: bar unys)

These are both consistent with the description "at least one child is a boy"!

The day-of-week versions:

3. Sample from all two-child families with at least one boy born on a Tuesday. The odds both are boys? (nyzbfg unys)

4. Knock on the door of a random two-child family. A boy born on Tuesday answers. Odds both are boys? (n unys)

3 comments

#2 is not actually equivalent to "at least one child is a boy". It is rather equivalent to "the first child is a boy". The difference may seem trivial, but one implies the other without the converse being true. This changes the probabilities — it's not an issue with underspecification.

I think your example #1 makes it much clearer why the 1/3 arises, at least in a frequentist analysis.

I would like to offer a similar interpretation but from a Bayesian lens. The 1/3 as rises due to the artificiality of the knowledge condition. Given real-world constraints, we expect any information collected to cleave neatly between the two children in our imagined information gathering scenario. So we implicitly translate "at least one child is a boy" to "we've checked one child, it's a boy".

Consider the following related problem: I have two faucets next to each other, each has a 50% chance of dripping overnight. I leave one shared bucket under both of them. The next day, the bucket is wet. What's the odds that _both_ faucets dripped?

This setup makes the correlative nature of the information much clearer, and I think most people would be less likely to jump to 1/2 as an answer.

The bucket formulation is very elegant.

I still feel the problem arises from English, not probability. It's clear that "we've checked one child, it's a boy" implies "at least one child is a boy." But furthermore, If someone tells me "at least one the two kids is a boy," I do not know how they arrived at that information. It could either have been through the bucket method or the knock-at-door method.

From a Bayesian perspective, we should consider both as possible with priors P and 1-P (i.e. the answer is somewhere between 1/3 and 1/2). On the other hand, from the perspective of someone taking a math test, I'd rather like the professor to tell me their own prior -- which, given they felt confident enough to put this on a test, they must believe it's basically 0 or basically 1.

Ultimately, both scenarios are describable by the same English phrase, and it feels proscriptivist to just consider one of them, even if it happens to have the least entropy in this case. There should always be the followup question asked: "_how_ did you know this?" and if it's kicked back to " because someone told me," either we need to ask how that person learned it or else bust out some priors.

Thanks for the compliment about the bucket, I was quite pleased with it :)

I do appreciate what you mean about the language issue — it's a misleading phrase that due to the context of the question encourages the listener to jump to "1/2". But it's quite a common expression in probability, and in that context the expression is unambiguous, if difficult to parse (like many things in mathematics, I suppose).

That makes sense.

I agree that it's must be a standard understanding among statisticians that one of these interpretations is implied (although maybe given what happened with the Monty Hall problem, it's not really so standard?). It's legitimately interesting that these two different interpretations result in different answers, but I feel that it is rather confusing to tell an outsider of the field that 1/3 is "the" answer and that their intuitions are wrong -- when actually it's just one conventional interpretation.

The Monty Hall problem is often understated, and for example the "intuitive" answer of 1/2 (i.e. that switching doesn't matter) can be restored if we assume the host himself didn't know where the car was and just happened to reveal another mule by chance. The assumption that the host knows where the car is is often not mentioned explicitly. Now it's just convention that in other such scenarios that there should be a similar understanding.

The way I like to think about the Monty Hall problem is by thinking of switching not as being "switch to another unspecified second door" but rather "switch to the winner among the other two doors, if any of them are winners".
The problem is ambiguous, due to under specification. That means that neither #1 nor #2 is "actually equivalent" to "at least one child is a boy," and more information is needed to construct a probability space.

#1 is "When both genders are known, and boys are preferred in the description, at least one is a boy." The preference is what makes the answer 1/3, and assuming it adds information to the problem.

#2 can be "When only one gender is known, and how we know it is uncorrelated with either possibility, at least one is a boy." But it can also be "When both genders are known, and the description reflects the probability of that gender being chosen at random from the two, at least one is a boy." In both cases, the answer is 1/2.

But being under-specified does not mean the question can't be answered, it just requires applying a reasonable assumption instead of an unreasonable one. #1 is very unreasonable since it adds information, #2 is close, but #3 is best.

And the proof is Bertrand's Box Paradox. That name does not properly refer to a probability problem, it applies to how to make this reasonable assumption.

"Mr. Jones has exactly two children. I have written the gender, of at least one, inside this sealed envelope. What is the probability that both children have that gender?"

If you were to open the envelope, and see the word "boy," the problem becomes the same as the one under discussion. If it can be answered, that answer is correct here as well. But it is an equivalent problem if you see the word "girl," and again the answer must be the same. If 1/3 is an acceptable answer, it means that 1/3 of all two-child families have two of the same gender, and 2/3 have mixed genders.

But that is a contradiction. We know that the split is 1/2:1/2. So the assumption, that 1/3 is a reasonable answer, is disproven. Now, that does not mean that the information came to us via #2 or #3, it just means we can't assume that it was #1.

Most often, the same logic is used for the Monty Hall Problem, it is just applied backwards.

> It's not actually true that the probability is 1/3, nor that the probability is 1/2.

You’re right. Those who are satisfied with the 1/3 answer may want to consider the following.

> I tell you I have two children and that (at least) one of them is a boy, and ask you what you think is the probability that the pair is single-sex.

1/3

> I tell you I have two children and that (at least) one of them is a girl, and ask you what you think is the probability that the pair is single-sex.

Also 1/3

> I tell you I have two children, and ask you what you think is the probability that the pair is single-sex.

1/2

So if I tell you that I have two children you think that the probability that they are of the same sex is 1/2. And when I tell you the gender of one of them, whatever it is, you will think that the probability goes down to 1/3?

The statement "at least one of them is a boy" (<=> "I don't have two daughters") is a little more subtle than "I tell you the gender of one of them" since the former excludes one out of four possibilities (FF, thus letting us update our belief on the single-sex question to a third) while the latter implies fixing the gender of a specific one of the children (without specifying which one, and in either case the probability of the other being M is still a half, thus not giving us information towards the single sex question).

So if you tell me the gender of a specific one of them, say the youngest, then I haven't learned anything that makes my subjective probability go down that the other is the same gender.

I think in real life you will come across the second kind of statement (e. g. "my oldest is a girl") than the first kind (e. g. "I do not have two boys")

But it does not feel too weird to me that "at least one of them is a girl" will reduce the probability of the pair being single-sex to a third. In fact if you further tell me that both "an least one of them is a girl" and "at least one of them is a boy", the probability of the pair being single-sex will go to zero and this seems perfectly reasonable

Do you agree with the following?

> I tell you I have two children and that (at least) one of them is a boy, and ask you what you think is the probability that the pair is single-sex.

1/3

> I tell you I have two children and that (at least) one of them is a girl, and ask you what you think is the probability that the pair is single-sex.

1/3

If you don’t, why not?

If you do, what’s your answer to the following question?

> I tell you I have two children and that I’ve just sent you an email with the sex of (at least) one of them, and ask you what you think is the probability that the pair is single-sex.

Will your answer change after you have a chance to check your messages?

In this scenario, you are subtly changing the meaning of "single-sex".

In the first two cases, "single-sex" means "the same specific sex as the child you know the sex of" whereas in the last case it means "the same sex as a child that can still have two possible sexes".

If you would say,

> I tell you I have two children and that I’ve just sent you an email with the sex of (at least) one of them, and ask you what you think is the probability that the pair are both girls?

and then follow up with another question,

> what you think is the probability that the pair are both boys?

and then add the two probabilities up equally weighted, you might see why 1/2 is the reasonable answer in that case.

(And why opening up the email in question would reduce the probability of one of the questions to 0, and the other to 1/3.)

As you find “both girls or both boys” problematic for some reason maybe we can discuss the following questions instead - where hopefully there is no subtle change of meaning.

————-

Do you agree with the following?

> I tell you I have two children and that (at least) one of them is a boy, and ask you what you think is the probability that I have one boy and one girl.

2/3

> I tell you I have two children and that (at least) one of them is a girl, and ask you what you think is the probability that I have one boy and one girl.

2/3

If you don’t, why not?

If you do, what’s your answer to the following question?

> I tell you I have two children and that I’ve just sent you an email with the sex of (at least) one of them, and ask you what you think is the probability that I have one boy and one girl.

I’ll come back to your reply later, but I would appreciate it if you could give a precise answer to the questions I asked.

It may help to find a common understanding on top of which we can build a clear discussion of the subtleties involved.

There are four cases to consider, MM (both kids are Male), FF (both kids are Female), MF and FM. So there's a 50% chance of same gender kids and a 25% chance for both kids to be female. So if you know the gender, say female, of one kid but not if they are the older or younger, you have these possibilities FF, FM or MF. And FF is 1/3 of that.
If I understand correctly what you said:

If I tell you that one kid is male, you think that the probability that there is one male and one female is 2/3.

If I tell you that one kid is female, you think that the probability that there is one male and one female is 2/3. (Right?)

If I don't tell you anything - beyond the fact that I have two kids - what's the probability that there is one male and one female?

There are four equally likely combinations (under the [both false!] assumptions of equal and independent sexes for children in the same family): MM, FM, MF, and FF; if you know that there is at least one male (or at least one female) you eliminate one of those possibilities, leaving the relative probabilities of the other three still equal.

So, knowing no additional information, the chance of one male and one female is two-fourths, or one-half.

Knowing that there is at least one male (eliminating FF), or at least one female (eliminating MM), the probability of one male and one female is 2/3.

If you know the sex and birth order of one, you eliminate two possibilities, retaining the relative probabilities of the remaining ones as equal, so if you know the first is male, eliminating FM and FF, then the probability of one male and one female is 1/2 (and similarly, mutatis mutandis, with other sex and birth order combinations, which produce the same result eliminating different pairs of possibilities.)

> Knowing that there is at least one male (eliminating FF), or at least one female (eliminating MM), the probability of one male and one female is 2/3.

Don't you always know that there is at least one male or one female?

I mean, if A="there is at least one male" and B="there is at least one female" you're telling me that if you know that A holds the probability is 2/3 and if you know that B holds the probability is 2/3.

But, knowing no additional information, you KNOW that A and/or B holds!

What’s your answer to the following question?

> I tell you I have two children and that I’ve just sent you an email with the sex of (at least) one of them, and ask you what you think is the probability that I have one boy and one girl.

> Don't you always know that there is at least one male or one female?

Knowing that there is at least one male or at least one female eliminates zero possibilities.

Knowing that there is at least one male or knowing that there is at least one female eliminates one possibility (a different one for each case, but the difference is immaterial to the probability of a mixed pair).

> What’s your answer to the following question?

> I tell you I have two children and that I’ve just sent you an email with the sex of (at least) one of them, and ask you what you think is the probability that I have one boy and one girl.

1/2

And if you know you will be told the sex of one child, with equal probability as to which the probability remains 1/2 when you are told, even though knowing without that constraint on how you will know makes it 1/3.

Because then the possibilities are (assume you are told “male”)

MM, told birth order 1

MM, told birth order 2

MF, told birth order 1

FM, told birth order 2

Your math is accurate. Once you are told the gender of one child with no other information, the odds of being all the same gender go down. Probability is tricky.
> I have two children…

Oh, you have two children? The probability that they are of the same sex is 1/2.

> and the sex of at least one of them is…

Say no more! If at least one of them is of some sex the odds that they are both of the same sex go down to 1/3.

I said 1/2 before but that was before knowing that at least one of them is either a boy or a girl. That changes everything! (Probability is tricky.)

Nice!

What's really fun about this problem is that you can have very convincing arguments for 1/2 being the correct answer, and very convincing arguments for 1/3 being the correct answer. And for either you can make subtle reformulations that supposedly illustrate how ridiculous this answer is.

And there is no way to know. There is no gold standard for designing an experiment that would show whether 1/2 or 1/3 is correct. You could set up something that generates millions of pairs of (virtual) kids and then count the pairs that fit. But each of these experiments will have built-in the assumption on which the response is ultimately already predicated on.

The only thing really convincing would be if everybody, all "sides", could agree on an experiment with an outcome that they would feel bound to. Then one could settle this once and for all, whether it's 1/2 or 1/3 or 13/27 or 729/1459 or whatnot. But people will never agree on such an experimental setup.

Which tells me that this is not a mathematical problem. This problem is either underspecified or it's contradictory. If it was uniquely specified then we could just use probability theory with its axioms and inference rules to derive at the correct answer. But we obviously can't, since nobody can agree on how to formally note this down.

> If it was uniquely specified then we could just use probability theory with its axioms and inference rules to derive at the correct answer.

You’re right.

I wrote in another comment the solution down to this two unspecified elements:

P(you tell me that you have two children including at least one boy | you have two boys)

P(you tell me that you have two children including at least one boy | you have one boy and one girl)

If one assumes that they are equal (why?) the answer is 1/3.

If one assumes that the latter is half as probable the answer is 1/2.

Whatever the assumption that one finds more natural the point is that an assumption is needed.

Any arguments for 1/2 are just wrong. This isn't an unknowable or undefined situation. It's counterintuitive, but that's different.
Do you agree with the following?

> I tell you I have two children and that (at least) one of them is a boy, and ask you what you think is the probability that I have one boy and one girl.

2/3

> I tell you I have two children and that (at least) one of them is a girl, and ask you what you think is the probability that I have one boy and one girl.

2/3

If you don’t, why not?

If you do, what’s your answer to the following question?

> I tell you I have two children and that I’ve just sent you an email with the sex of (at least) one of them, and ask you what you think is the probability that I have one boy and one girl.

It is not sufficient to know one of them is of some sex. For the probability to be 1/3, you need to be asked what the probability is that one of them is a specific sex, not just any sex.
I think the trickiest part is that the other party willingly shared some information and their motives affect probabilities way more than any math.

I find it easier to think about this problem stated like this: let's say you go around asking people " do you have exactly 2 children and at least one of them is a boy?". What are the odds of them having 2 boys if they answered yes.

All probability questions suffer from the same bias. The Monty Hall problem doesn't work if the person offering the choice has some agency and motives.
Another classic example of "sampling method matters" is that the average arrival time between trains is longer for passengers than the train operator. (Because a randomly selected passenger is more likely to be one of the many waiting for a delayed train, than one who happened to get on an earlier train.)