Hacker News new | ask | show | jobs
by Houshalter 3300 days ago
By far the most unintuitive paradox for me personally is the one presented here: https://youtu.be/go3xtDdsNQM?t=3m27s

"Mr. Jones has 2 children. What is the probability he has a girl if he has a boy born on Tuesday?" Somehow knowing the day of the week the boy was born changes the result. It's completely bizarre.

5 comments

The question is ill-posed: it does not give you enough information to tell the probability. You know what Mr. Jones has told you, but you don't know under what circumstances he would have told you this.

Suppose that you ask Mr. Jones weather he has a boy and he says yes. Then the probability that he also has a girl is 2/3.

Suppose that you asked Mr. Jones weather he had a boy born on a Tuesday, and he says yes. Then the probability that he has a girl is less than 2/3, because having two boys gives (about) double the chance for one of them to have been born on a Tuesday.

However, suppose that you asked Mr. Jones weather he has a boy, and if so what day his eldest boy was born on, and he says "yes, and on Tuesday". Then the probability that he also has a girl is again exactly 2/3.

Wikipedia has a detailed explanation: https://en.wikipedia.org/wiki/Boy_or_Girl_paradox

Everything you say after your first paragraph is correct (presuming people always answer questions with "Yes" or "No" honestly), but…

No one said anything about "Mr. Jones has told you…", here. There was nothing about asking Mr. Jones a question and him providing an answer according to some process.

Rather, the question was simply "Mr. Jones has two children. What is the probability he has a girl if he has a boy born on Tuesday?".

There are implicit conventions involved in reading this, but not particularly problematic ones. This implicitly means "Out of all families with two children, at least one of which is a boy born on Tuesday, what proportion have a girl? [Presuming that out of those families, birth gender and day of the week for the two children are all independently uniformly distributed]". And this is a straightforward counting problem.

So the wording seems fine and the problem well-posed to me.

The problem really is almost one of metagaming.

People who are reading this are likely to have seen, for example, questions which read as if they're asking for a conditional probability ("John is male, 33 years old, and has a degree in English literature; what is the probability he works as a barista?") but are designed to let the questioner turn around and say "Ah-HA! I got you! It was really a question about the base rate (in this case, of baristas)!".

As posed and with knowledge of that issue, this question reads like an attempt to do the opposite: to pose a question which seems like it's asking about the base rate of boys vs. girls, but then the questioner turns around with "Ah-HA! I got you! It was really a question about the conditional probability!"

Once it's phrased in a way that makes explicit that it really is a question about conditional probability, and not an attempt to lure someone into a base-rate trap, there's no paradox.

Complicating things is that analyses usually focus on the day of the week as the crucial factor, when it's easier to get to an intuitive understanding of the probability via dealing with the day-of-week first and then focusing on the small but crucial change that comes from knowing the gender of one of the children. After accounting for day-of-week you are left with 28 equally-probable situations, with at least one girl in 14 of them, for the expected 1/2. Then the fact that you end up at a probability just over 1/2 is due to the elimination of the case in which both children are girls (since we know at least one is a boy), which pushes the final result to 13/27 in favor of the second child being a boy.

There are implicit conventions involved in reading this

Explicitly the question adds no such limits. So, abstractly someone could be asking the question without those limits.

It's like the difference between infinity and how whatever subset of math you work in defines infinity. And yes there are more than one commonly used definition.

Sure, and if the quibble was along the lines of "You never explicitly said boys and girls are 50-50 distributed! You never explicitly said elder and younger childrens' birth genders are independent! You never explicitly said birth-days-of-the-week are uniformly…", then that would be fair, if pedantic.

But this "You know what Mr. Jones has told you, but you don't know under what circumstances he would have told you this" objection is objecting to some other problem than the one posed; the problem posed had nothing to do with Mr. Jones saying anything.

I understand the reason for worrying about this, because many probability riddles ARE poorly worded or presented in such a way as that this becomes an issue, but it wasn't the case here. (Note: I haven't watched rest of the video and have no comment on it; I'm just considering the wording of this individual question within it)

There was never any claim that Mr. Jones said anything, and no one was called to infer anything from any actions taken by Mr. Jones. He could be a lifelong mute. Rather, the fact that Mr. Jones has two children was presented, by an omniscient narrator, and then a counting question was asked.

(Indeed, Mr. Jones himself is completely irrelevant to the problem asked, except as a way of framing the counting question to be about two-children families. The question asked might as well have been "What proportion of two-children families with a boy born on Tuesday have girls?". It was very slightly differently worded, but not in such a way as makes "We don't know what Mr. Jones was asked!" a relevant objection.)

I also fell into the same ambiguity trap, and I think that the objection about explicit wording is a fair one to make.

"What proportion of two-children families with a boy born on Tuesday have girls?" seems completely clear to me. I would have answered that question relatively quickly.

But the original question had me very confused. I felt a strong desire to ask more about the situation. A great deal of my intuition wanted to say that "well there is nothing special about Tuesday... Any boy that he has is going to be born on some day of the week, and if whatever day of the week that son is born on is included as this line item in the question, then that line item is irrelevant."

I wouldn't have fallen into that same trap in the case of the "What proportion of two-children families..." version because the "Any boy that he has is going to be born on some day of the week" logic doesn't apply.

Tuesday seemed like it might have been arbitrary in the original question, where it seems explicit in your rephrased version.

I mean, it's just as arbitrary in my rephrased version. I could just as well ask "What proportion of two-children families with a boy born on Monday have girls?". But, very well, the different wording prompted differing intuitions for you; so it goes.
> Rather, the fact that he had two children was presented, by an omniscient narrator.

That the narrator is omniscient doesn't change anything. The question still remains: under what circumstances would the narrator have told you, e.g., that "he has a boy born on Tuesday" vs. "he has a girl born on Tuesday". Perhaps this omniscient narrator really likes girls, in which case they would tell you about a girl if Mr. Jones had any girls. Then since they told you "Mr. Jones has a boy born on Tuesday", you know definitely that Mr. Jones has no girls.

Ignoring the source of your knowledge doesn't make that source any less important. And the standard convention you're talking about corresponds to a source of knowledge where you ask a yes/no question and get a yes, which is frequently unrealistic. This is why it disagrees with people's intuition, and this problem is called a paradox.

As a probability problem with the standard assumptions, it's a well defined question. If you saw this in Bertsekas or Sheldon Ross, the sampling would be clear.

And I also think you're incorrect about why it's a paradox. People are just bad at understanding and estimating things in conditional probabilities. Further, the answer changes based on the sampling regime, which (as mentioned) was not explicitly stated but is clear to almost any student that's taken a discrete probability class.

Following classical probability arguments, we consider a large urn containing two children.

:-O

Your problem is that you are thinking there's a "the boy". But there's not a "the boy". Mr. Jones could have two boys. He could have two boys both born on Tuesday, even. The term "the boy" does not denote any particular boy, in that case, and causes you to think about the situation erroneously.

If the question were "There's Kid 1 and Kid 2, each independently selected with random gender and birth-day-of-the-week. Out of those cases where Kid 1 is a boy born on Tuesday, what proportion are cases where Kid 2 is a girl?", then the answer would indeed be a straightforward 50%; the status of Kid 1 is entirely independent of the status of Kid 2.

But that's not the question. The question is "There's Kid 1 and Kid 2, each independently selected with random gender and birth-day-of-the-week. Out of those cases where at least one (either one, and possibly both) of Kid 1 and Kid 2 is a boy born on Tuesday, what proportion are cases where at least one of Kid 1 and Kid 2 is a girl?".

This is very different, and of course just drawing out the possibilities (all 2 * 7 * 2 * 7 equiprobable-by-stipulation choices of gender and birth-day-of-the-week for Kid 1 and Kid 2) and circling which pairs of subsets are the relevant ones for the two questions reveals the difference, the probabilities for either question elementarily calculable in this way by basic counting.

The 14/27 answer in the video is correct, incidentally.

Also, I notice you said "Somehow knowing the day of the week the boy was born changes the result. It's completely bizarre."

Remember, though, there's no "the boy". The question "On which day of the week was the boy born? Tell me, I need to know!" does not always have a well-defined answer.

Indeed, you'd get the same 14/27 answer even if "Tuesday" in the question "What proportion of two-children families with at least one Tuesday boy have a girl?" was replaced by any other day. And if this seems paradoxically in conflict with the fact that simply asking "What proportion of two-children families with at least one boy have a girl?" has instead the answer 2/3, reflect again upon the fact that some families have two boys born on different days, so that there's no single answer to "On what day was 'the boy' born?". And then just draw out the cases and count.

(Specifically, out of the 2 * 7 * 2 * 7 equiprobable cases overall for Kid 1 and Kid 2's genders and days, there are 27 cases where there's at least one Tuesday boy, and 14 cases where there's at least one Tuesday boy and also a girl. There are 3 * 7^2 cases where there's at least one boy, and 2 * 7^2 cases where there's at least one boy and also a girl.)

Many of these questions, I think, become clearer if thought of as counting questions instead of as "probability" questions (though it's all the same; the math called "probability" is just the math of various kinds of counting (from simple counting as in this case to complexly weighted continuous measurements, but still ultimately a generalized form of counting). However, despite that equivalence, the concept "probability" has developed all these other distracting connotations, such that psychologically, there can be a useful difference in perspective in switch to explicitly thinking "counting" instead. No one would long dispute that there are 27 cases with at least one Tuesday boy, etc.).

Agreed. He removed the second B2B2 probability annotation as though it were a repeat of the first and inapplicable to the probability set, but that's not the case, and it shouldn't be removed. Apply lower-case to the younger boy in the probability sets and it's clear why. B2b2 is not the same occurrence as b2B2. Even though the day both were born on was "a Tuesday" doesn't mean both probability instances are referring to the exact same event. Except in the case of twins, which is outside the scope of the exercise.
The comments to this video actually say (with proof) that this was an error in the video.
There seems to be quite a bit of debate about it in the comments and I'm not sure who to believe. At one point someone coded a simulation to test it and the results were as predicted by the video. Even if the video is incorrect, the fact it's so confusing still makes it an interesting paradox.
I'm an idiot, but I'm going to throw my hat in the ring here:

The video is wrong. The problem reads: Jones has 2 kids. What is P(he has a girl) given that he has a boy born on a Tuesday. Consider, for a moment, what information we're getting from "boy born on a Tuesday." This is no different than "boy with red hair," or "boy with 5 freckles." The fact that the BOY was born on a tuesday does not change P(day of the week girl was born). Imagine the "boy with 5 freckles" case - let 5 freckles be denoted by F5, six freckles by F6 and so on... would the appropriate calculation include enumerating P(boy F5, boy Fn) for all n? No.

The "born on Tuesday" is irrelevant. Thus you have the following scenarios: - one kid is TuesdayBoy and the other is also a boy, born at any time - one kid is TuesdayBoy and the other is a girl, born at any time

Out of these options P(Jones has a girl) is a flat out 50%. There is no need to bring in concepts of "which was born first" or enumerate all possible days of the week each child could have been born.

Ok... now all the real smartypants here can correct me :)

There are 2 * 7 * 2 * 7 ways to assign gender and birth-day-of-week to two children. By convention, all are considered equiprobable (this is the same as assuming kids' genders and birth day-of-weeks are independent of each other and of all facts about other kids, and that both genders are equally likely and all 7 days are equally likely for any given kid.)

Of these possibilities, 27 are situations where one kid is a Tuesday boy. [Do you dispute this count?]

Of those, 14 are situations where one kid is a girl. [Do you dispute this count?]

The answer to "What proportion of cases where there is at least one Tuesday boy also have a girl?" is thus 14/27.

You have stated by fiat that certain things are irrelevant to certain other things, that certain things have probability 50%, etc, but in doing so, you have not considered the count correctly. You are likely misled by phrasing such as "the boy", when there are families with two boys in which there is no proper referent of "the boy" and no particular answer to question like "Which day was 'the boy' born?".

ok ok... let me try to get this straight. Just as kind of a mental process for trying to understand whether or not something passes the smell test, I typically try to take the basic premise and turn it up to 11 and see if that still makes sense.

In this problem, as you've described it, we're enumerating "ways to assign gender and birth-day-of-week." We can do this because there are a countable number of "days of the week" (so we can map to the integers: 1-6) AND there is also a surjective function of [child] -> [day of the week they were born]. Am I right so far?

Now let's replace the set [1-6] with another countable set that also maintains the surjective function. We could say "day in the lunar cycle" (so ~27 options), or better "day of the year" (366 options), for example. Do we now need to consider the 23662366 ways to assign gender and birth-day-of-the-year? Take it further with whatever you want: "birth weight in milligrams" or "number of freckles" (as I previously suggested). All countable things that meet the surjective requirement.

This is starting to smell funny, right? So let's take a look at the math.

You say there are 2727 ways to configure day+gender, assuming independence for kid 1 (k1) and kid 2 (k2). This represents: (k1 gender options * k1 day of week options) * (k2 gender options * k2 day of week options). Right? I'm with you so far. Then you say "Of these possibilities, 27 are situations where one kid is a Tuesday boy." Hold up.

We are given two pieces of information: that one of the kids is a boy, and that particular boy was born on a Tuesday. Let's say the boy is k1 (this is an assignment of enumeration, not of "who came first;" just like Sunday = 1 does not mean that any kid born on a Sunday was born before every kid born on Monday = 2). So now the k1 options are [11] (boy, tuesday), and the total number of options are: [11] * [27] = 14. Of those 14, 7 are girl options. And we're back to a straight 50%.

So yes, I dispute the 27 number. It seems like it is arrived at by 2127, minus one for an apparent duplicate. But the 212*7 represents maintaining gender non-specificity for Tuesday boy, which should be incorrect, no?

> You have stated by fiat that certain things are irrelevant to certain other things...

Yes, but that's what "independent" means, right? You also stated that you're assuming these two things are independent, hence equiprobability. But independence is defined by P(A) = P(A|B). The probability of A is completely unaffected by B. Yet the outcome you arrive at is that P(A) IS affected by B, so the math presented is internally inconsistent.

What am I missing here? I'm fascinated by the uncertainty around this little problem.

Let's see if I can help you understand this a bit better. First, let's clarify the problem being asked. There are 2 different problems with different solutions and it helps to explicitly separate them.

problem 1) You go up to a person and ask them if they have exactly 2 children, at least one of which is a boy born on Tuesday. They say yes. What is the probability that they have a girl?

problem 2) You go up to a person and ask them if they have exactly 2 children, at least one of which is a boy. They say yes. You then ask them which day of the week a boy they have was born on. They say Tuesday. What is the probability that they have a girl?

The original problem that was posed is equivalent to problem 1, but not equivalent to problem 2. This could be what is confusing you, because in problem 2 the extra information plays no role in the selection process, while it does play a role in problem 1. In problem 2, the answer is the standard 2/3. Why are the probabilities different between problem 1 and 2? Here's why:

Think about the set of people who could answer yes to the question in problem 2. The ratio of these groups is important. A parent with BB (two boys) is equally likely to answer yes to problem 2 (100% likely to be exact) as a parent with BG and GB (also 100% likely to answer yes), which leads to the correct solution of 2/3. However, in problem 1 a parent with BB is NOT EQUALLY LIKELY to answer yes as a parent with BG. This is because we added an extra qualifier (must be born on Tuesday). The parent with BB has two chances to meet this qualifier because they have two boys, so the parent with BB is actually more likely to answer yes to the question than the parent with BG. As the qualifier becomes more and more rare (day of lunar cycle), the probability of the BB parent answer yes P(yes|BB) approaches twice the value of P(yes|BG). So now you're left with some subset of parents with BB, BG, and GB, but in this scenario you've sampled from BB approximately twice as much as you've sampled from each of the BG and GB groups, leaving you with approximately the same number of people from group BB as the combined amount from groups BG and GB. This is why the probability approaches 50%

I spend a while writing this, so hopefully it helps!

:) Thanks for taking the time. I've realized a few things, and found it helped to get a bit more formal.

Jones has 2 kids. Let A be "he has a girl" and B be "he has a boy born on tuesday." First thing I realized is A and B are NOT independent - this is key. P(A) includes the option of Jones having two girls. But if B is true, then the two girls option isn't on the table anymore, which affects P(A). Realizing this helped me start to better understand what kind of problem we're dealing with.

Second was realizing that P(A&B) is not at all the same thing as P(A|B) - the probability of A given B - when A and B aren't independent. The problem is asking for P(A|B), and by the rule of conditional probability: P(A|B) = P(A&B)/P(B)

P(B) can be solved for without too much fuss: solve 1-P(!B). For each kid you have 2 genders and 7 days of the week, or 2 * 7 = 14 options. 13 of those are not "Boy & Tuesday." So you have P(!B) is (13/14) * (13/14) = 169/196. P(B) = 1 - 169/196 = 27/196.

This leaves us trying to figure out P(A&B). I can't think of any other way to do it other than enumerating all options. We can take a shortcut and just look at all 27 possible scenarios where B is true. This seems to be the method of choice ;) As others have shown, we see that 14 of those satisfy A. So P(A&B) = 14/196.

Now, we can solve: P(A|B) = P(A&B)/P(B) = (14/196)/(27/196) = 14/27

So I'm now part of the "math checks out" club. Thanks for all the help people!

You can't assume "the boy is k1". The original 2 * 7 * 2 * 7 cases were indeed all equiprobable different cases. And we're not given that K1 is a boy. We're given that at least one child is a boy.

If K1 is a girl and K2 is a boy born on Tuesday, this still counts as the family (Mr. Jones, if you like) having a boy born on Tuesday. There are 27 cases that count as the family having a boy born on Tuesday, all equiprobable. And out of those, 14 also count as the family having a girl.

As for your noting that we can split these cases even more finely, so that there's no distinguished end-all, be-all partitioning of cases, sure, you can do that. What I'm really saying is this:

1/2 of two-child families have their elder child being a boy. 1/2 of two-child families have their elder child being a girl. [On conventional idealizations for these problems. You surely do not dispute this, yes? You may not care about this number, but you don't dispute it, right?]

In each of those subgroups, 1/7 of families have their elder child born on Sunday, 1/7 have their elder child born on Monday, etc. [Do you dispute this?]

In each of THOSE subgroups, 1/2 of families have their younger child a boy, and 1/2 have their younger child a girl. [Any dispute?]

And in each of THOSE subgroups, 1/7 of families have their younger child born on Sunday, 1/7 of families have their younger child born on Monday, etc. [Any dispute?]

And some amount of those have low birth weight, some have high birthweight, some have 5 freckles, etc., but we needn't figure out those numbers.

So now I've carved the world up into 2 * 7 * 2 * 7 groups, based on gender and birth-date-of-week for older and younger child. We can carve the world up into groups in different ways also, more finely or more coarsely or just differently. But making the four conventional assumptions we just made, the 2 * 7 * 2 * 7 grouping based on gender and birth-date-of-week for older and younger child is such that each particular such group takes up 1/2 * 1/7 * 1/2 * 1/7 of all families; these are all equifrequent groups.

And that having been done, we find that in 27 of these groups, there is at least one boy born on a Tuesday. In 13, the elder child is a boy born on Tuesday but not the younger child; in 13, the younger child is a boy born on Tuesday but not the elder child; in 1, both children are boys born on Tuesday.

But the question was not intended to be about a specific boy. The question was intended to be "Out of families that have a boy born on Tuesday (meaning at least one boy born on Tuesday), what proportion have a girl?". Any family with at least one boy born on Tuesday counts as having "a boy born on Tuesday", and even families with two boys born on Tuesday count, with no particular of their two boys given any distinguished status.

Perhaps you read the question differently; that, then, is a problem with the phrasing of the question for communicating to you its intent. But when it was asked "What is the probability Mr. Jones has a girl, given that he has a boy born on Tuesday?", what the author indeed intended this to mean, and would be generally taken in the conventional language of probability to mean, was "Out of families that have at least one boy born on Tuesday, what proportion have a girl?".

And we find that, out of the 27 equally sized groups of families that have at least one boy born on Tuesday, 14 of them have a girl, so that the answer to this question becomes 14/27.

Frankly, though, I'd prefer no one ever used the "conventional language of probability", because it leads to precisely these miscommunications.

If the question had been phrased "Out of two-children families that have at least one boy born on Tuesday, what proportion have a girl? [on natural assumptions about lack of biases or correlations concerning the distribution of children's genders and days]", would you agree that the answer was 14/27?

That was the question the author intended to ask. The dispute may simply be as to whether the question which the author did ask is equivalent to the above; if that is indeed our only disagreement, we can still investigate that dispute further, if you like. But let's first see if the dispute is linguistic or mathematical.

> Out of two-children families that have at least one boy born on Tuesday, what proportion have a girl?

OH YES.

I finally figured it out (see the other comment). Thanks for all the explaining, but this statement right here was the best.

First step back and consider the possibilities given no knowledge whatsoever:

For each child the problem constrains to one of two possible sexes and one of seven possible days of birth.

2 * 7 = 14 possible sex/day combinations for a single child.

(2 * 7) * (2 * 7) = 196 possible sex/day combinations for a pairing of two children. To see why, you could write a program to enumerate all of them, starting with the pairing "Boy/Monday + Boy/Monday", then "Boy/Monday + Boy/Tuesday" and so on until you exhaust all possible options at "Girl/Sunday + Girl/Sunday". You'll see there are 196 options.

Now start applying the facts given to us: one of the children is born on a Tuesday (eliminate all possibilities which don't have at least one Tuesday child), and that child is a boy (eliminate all possibilities in which there is not a Tuesday child who is also a boy).

This leaves exactly 27 possible cases:

Boy/Sunday + Boy/Tuesday,

Boy/Monday + Boy/Tuesday,

Boy/Tuesday + Boy/Tuesday,

Boy/Wednesday + Boy/Tuesday,

Boy/Thursday + Boy/Tuesday,

Boy/Friday + Boy/Tuesday,

Boy/Saturday + Boy/Tuesday,

Girl/Sunday + Boy/Tuesday,

Girl/Monday + Boy/Tuesday,

Girl/Tuesday + Boy/Tuesday,

Girl/Wednesday + Boy/Tuesday,

Girl/Thursday + Boy/Tuesday,

Girl/Friday + Boy/Tuesday,

Girl/Saturday + Boy/Tuesday,

Boy/Tuesday + Boy/Sunday,

Boy/Tuesday + Boy/Monday,

Boy/Tuesday + Boy/Wednesday,

Boy/Tuesday + Boy/Thursday,

Boy/Tuesday + Boy/Friday,

Boy/Tuesday + Boy/Saturday,

Boy/Tuesday + Girl/Sunday,

Boy/Tuesday + Girl/Monday,

Boy/Tuesday + Girl/Tuesday,

Boy/Tuesday + Girl/Wednesday,

Boy/Tuesday + Girl/Thursday,

Boy/Tuesday + Girl/Friday,

Boy/Tuesday + Girl/Saturday

If you count, you'll see that of those 27, there are 13 with two boys and 14 with a boy and a girl. The probability of two boys, given that one child is a boy born on Tuesday, is thus 13/27.

I adamantly agreed with you. Then I made a simple spreadsheet that proves us wrong: http://cl.ly/kuQE
Ah, but see... you're counting (B2,B2) as one item because "order doesn't matter", but then counting (G2,B2) and (B2,G2) independently. If (G2,B2) is different than (B2,G2), then (B2,B'2) is distinct from (B'2,B2).
Think of the x-axis as the first child and the y-axis as the second child. One in fourteen chance of choosing a column and one in fourteen chance to choose a row. I fail to see how there could be any additional outcomes or that any square has a greater chance of occurring than another..
It's fairly simple.

If you flip 2 coins then say whatever the first coin was the the odds if you said H was HH, or HT and if you said T it would be TH, TT. However, if you flip two coins and then say if you got at least one head independently from whatever you flipped then the odds you have 3 options HT, HH, TH with equal odds.

So, the question is if the full statement was based on the data or only the truth value of the statement is based on the data.

PS: Now assuming it's truth value is based on data. if you look at all options there are 14 gender day combinations per kid and 14 * 14 = 196 gender day combinations in totoal. Only 14 of of those 196 start BT which is then split evenly 7 BTB_, 7 BTG_. However that leaves 196 - 14 other options to consider. 7 * 14 of them Start G, and 6 * 14 of them start with B not on a Tuesday, but out of those you only keep 1/14 as you need BT on the second roll. Now add them up 13B and 14G out of (13 + 14) = 27. Or 13/27 B, and 14/27G.