| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by astrocat 3299 days ago

ok ok... let me try to get this straight. Just as kind of a mental process for trying to understand whether or not something passes the smell test, I typically try to take the basic premise and turn it up to 11 and see if that still makes sense.

In this problem, as you've described it, we're enumerating "ways to assign gender and birth-day-of-week." We can do this because there are a countable number of "days of the week" (so we can map to the integers: 1-6) AND there is also a surjective function of [child] -> [day of the week they were born]. Am I right so far?

Now let's replace the set [1-6] with another countable set that also maintains the surjective function. We could say "day in the lunar cycle" (so ~27 options), or better "day of the year" (366 options), for example. Do we now need to consider the 23662366 ways to assign gender and birth-day-of-the-year? Take it further with whatever you want: "birth weight in milligrams" or "number of freckles" (as I previously suggested). All countable things that meet the surjective requirement.

This is starting to smell funny, right? So let's take a look at the math.

You say there are 2727 ways to configure day+gender, assuming independence for kid 1 (k1) and kid 2 (k2). This represents: (k1 gender options * k1 day of week options) * (k2 gender options * k2 day of week options). Right? I'm with you so far. Then you say "Of these possibilities, 27 are situations where one kid is a Tuesday boy." Hold up.

We are given two pieces of information: that one of the kids is a boy, and that particular boy was born on a Tuesday. Let's say the boy is k1 (this is an assignment of enumeration, not of "who came first;" just like Sunday = 1 does not mean that any kid born on a Sunday was born before every kid born on Monday = 2). So now the k1 options are [11] (boy, tuesday), and the total number of options are: [11] * [27] = 14. Of those 14, 7 are girl options. And we're back to a straight 50%.

So yes, I dispute the 27 number. It seems like it is arrived at by 2127, minus one for an apparent duplicate. But the 212*7 represents maintaining gender non-specificity for Tuesday boy, which should be incorrect, no?

> You have stated by fiat that certain things are irrelevant to certain other things...

Yes, but that's what "independent" means, right? You also stated that you're assuming these two things are independent, hence equiprobability. But independence is defined by P(A) = P(A|B). The probability of A is completely unaffected by B. Yet the outcome you arrive at is that P(A) IS affected by B, so the math presented is internally inconsistent.

What am I missing here? I'm fascinated by the uncertainty around this little problem.

2 comments

finind 3299 days ago

Let's see if I can help you understand this a bit better. First, let's clarify the problem being asked. There are 2 different problems with different solutions and it helps to explicitly separate them.

problem 1) You go up to a person and ask them if they have exactly 2 children, at least one of which is a boy born on Tuesday. They say yes. What is the probability that they have a girl?

problem 2) You go up to a person and ask them if they have exactly 2 children, at least one of which is a boy. They say yes. You then ask them which day of the week a boy they have was born on. They say Tuesday. What is the probability that they have a girl?

The original problem that was posed is equivalent to problem 1, but not equivalent to problem 2. This could be what is confusing you, because in problem 2 the extra information plays no role in the selection process, while it does play a role in problem 1. In problem 2, the answer is the standard 2/3. Why are the probabilities different between problem 1 and 2? Here's why:

Think about the set of people who could answer yes to the question in problem 2. The ratio of these groups is important. A parent with BB (two boys) is equally likely to answer yes to problem 2 (100% likely to be exact) as a parent with BG and GB (also 100% likely to answer yes), which leads to the correct solution of 2/3. However, in problem 1 a parent with BB is NOT EQUALLY LIKELY to answer yes as a parent with BG. This is because we added an extra qualifier (must be born on Tuesday). The parent with BB has two chances to meet this qualifier because they have two boys, so the parent with BB is actually more likely to answer yes to the question than the parent with BG. As the qualifier becomes more and more rare (day of lunar cycle), the probability of the BB parent answer yes P(yes|BB) approaches twice the value of P(yes|BG). So now you're left with some subset of parents with BB, BG, and GB, but in this scenario you've sampled from BB approximately twice as much as you've sampled from each of the BG and GB groups, leaving you with approximately the same number of people from group BB as the combined amount from groups BG and GB. This is why the probability approaches 50%

I spend a while writing this, so hopefully it helps!

link

astrocat 3299 days ago

:) Thanks for taking the time. I've realized a few things, and found it helped to get a bit more formal.

Jones has 2 kids. Let A be "he has a girl" and B be "he has a boy born on tuesday." First thing I realized is A and B are NOT independent - this is key. P(A) includes the option of Jones having two girls. But if B is true, then the two girls option isn't on the table anymore, which affects P(A). Realizing this helped me start to better understand what kind of problem we're dealing with.

Second was realizing that P(A&B) is not at all the same thing as P(A|B) - the probability of A given B - when A and B aren't independent. The problem is asking for P(A|B), and by the rule of conditional probability: P(A|B) = P(A&B)/P(B)

P(B) can be solved for without too much fuss: solve 1-P(!B). For each kid you have 2 genders and 7 days of the week, or 2 * 7 = 14 options. 13 of those are not "Boy & Tuesday." So you have P(!B) is (13/14) * (13/14) = 169/196. P(B) = 1 - 169/196 = 27/196.

This leaves us trying to figure out P(A&B). I can't think of any other way to do it other than enumerating all options. We can take a shortcut and just look at all 27 possible scenarios where B is true. This seems to be the method of choice ;) As others have shown, we see that 14 of those satisfy A. So P(A&B) = 14/196.

Now, we can solve: P(A|B) = P(A&B)/P(B) = (14/196)/(27/196) = 14/27

So I'm now part of the "math checks out" club. Thanks for all the help people!

link

Chinjut 3299 days ago

You can't assume "the boy is k1". The original 2 * 7 * 2 * 7 cases were indeed all equiprobable different cases. And we're not given that K1 is a boy. We're given that at least one child is a boy.

If K1 is a girl and K2 is a boy born on Tuesday, this still counts as the family (Mr. Jones, if you like) having a boy born on Tuesday. There are 27 cases that count as the family having a boy born on Tuesday, all equiprobable. And out of those, 14 also count as the family having a girl.

As for your noting that we can split these cases even more finely, so that there's no distinguished end-all, be-all partitioning of cases, sure, you can do that. What I'm really saying is this:

1/2 of two-child families have their elder child being a boy. 1/2 of two-child families have their elder child being a girl. [On conventional idealizations for these problems. You surely do not dispute this, yes? You may not care about this number, but you don't dispute it, right?]

In each of those subgroups, 1/7 of families have their elder child born on Sunday, 1/7 have their elder child born on Monday, etc. [Do you dispute this?]

In each of THOSE subgroups, 1/2 of families have their younger child a boy, and 1/2 have their younger child a girl. [Any dispute?]

And in each of THOSE subgroups, 1/7 of families have their younger child born on Sunday, 1/7 of families have their younger child born on Monday, etc. [Any dispute?]

And some amount of those have low birth weight, some have high birthweight, some have 5 freckles, etc., but we needn't figure out those numbers.

So now I've carved the world up into 2 * 7 * 2 * 7 groups, based on gender and birth-date-of-week for older and younger child. We can carve the world up into groups in different ways also, more finely or more coarsely or just differently. But making the four conventional assumptions we just made, the 2 * 7 * 2 * 7 grouping based on gender and birth-date-of-week for older and younger child is such that each particular such group takes up 1/2 * 1/7 * 1/2 * 1/7 of all families; these are all equifrequent groups.

And that having been done, we find that in 27 of these groups, there is at least one boy born on a Tuesday. In 13, the elder child is a boy born on Tuesday but not the younger child; in 13, the younger child is a boy born on Tuesday but not the elder child; in 1, both children are boys born on Tuesday.

But the question was not intended to be about a specific boy. The question was intended to be "Out of families that have a boy born on Tuesday (meaning at least one boy born on Tuesday), what proportion have a girl?". Any family with at least one boy born on Tuesday counts as having "a boy born on Tuesday", and even families with two boys born on Tuesday count, with no particular of their two boys given any distinguished status.

Perhaps you read the question differently; that, then, is a problem with the phrasing of the question for communicating to you its intent. But when it was asked "What is the probability Mr. Jones has a girl, given that he has a boy born on Tuesday?", what the author indeed intended this to mean, and would be generally taken in the conventional language of probability to mean, was "Out of families that have at least one boy born on Tuesday, what proportion have a girl?".

And we find that, out of the 27 equally sized groups of families that have at least one boy born on Tuesday, 14 of them have a girl, so that the answer to this question becomes 14/27.

link

Chinjut 3299 days ago

Frankly, though, I'd prefer no one ever used the "conventional language of probability", because it leads to precisely these miscommunications.

If the question had been phrased "Out of two-children families that have at least one boy born on Tuesday, what proportion have a girl? [on natural assumptions about lack of biases or correlations concerning the distribution of children's genders and days]", would you agree that the answer was 14/27?

That was the question the author intended to ask. The dispute may simply be as to whether the question which the author did ask is equivalent to the above; if that is indeed our only disagreement, we can still investigate that dispute further, if you like. But let's first see if the dispute is linguistic or mathematical.

link

astrocat 3299 days ago

> Out of two-children families that have at least one boy born on Tuesday, what proportion have a girl?

OH YES.

I finally figured it out (see the other comment). Thanks for all the explaining, but this statement right here was the best.

link