Hacker News new | ask | show | jobs
by btilly 2385 days ago
> a Bayesian looks at this and says that no matter what prior you pick, the knowledge that they planned to have children until they had both a boy and a girl does not affect your posterior conclusion

A Bayesian would say no such thing...

Actually they would if they understood the formula. Bayes' formula has no place to put for things that could have been observed had things turned out differently, but which didn't actually happen. Therefore mighta, woulda, coulda but didn't cannot affect your conclusions. Ever.

However, a Bayesian would also say that the knowledge that they planned to have children until they had both a boy and a girl significantly changes the likelihood ratio (or p-value, if you prefer to use that) associated with the observed data. And one of the advantages of Bayesianism is that it forces you to make that explicit as well.

I am not sure how you think that the calculation should be carried out. But it certainly shouldn't be done the way that you describe.

If your prior was that a fraction p of the children would be boys, the odds of the observed outcome would be p^6 * (1-p). It is that regardless of which version of the experiment you run. The conditional probability the outcome being around p given the data is the odds in your prior of the probability being around p, divided by the a priori odds of the observed outcome, 6 boys and then a girl. The calculation is the same in both versions of the experiment and therefore the conclusion is as well.

And Bayesianism does something else too: it forces you to recognize that the p-value is not actually the answer to the question you were asking! By the p-value criterion, at least with the typical threshold of 0.05, the null hypothesis (that your aunt and uncle are not biased towards having one gender) is rejected. But a Bayesian recognizes that the prior probability of the gender ratio, based on abundant previous evidence, is strongly peaked around 50-50, much more strongly peaked than data with a bias equivalent to a p-value of 2/127 can overcome. So the Bayesian is quite ready to accept that your aunt and uncle had no actual bias towards having boys, they just happened to be one of the statistical outliers that are to be expected given the huge number of humans who have children.

Actually a Bayesian with access to actual population data would be aware, as you aren't, that globally we average 1.07 boys to each girl at birth. Therefore most couples, likely including my aunt and uncle, were probably biased towards having boys.

There is a good deal of coincidence involved in my actually having the setup for a classic criticism of frequentism in a close relative. But if it happened, the odds were in favor of it involving 6 boys and a girl rather than the other way around.

2 comments

> Bayes' formula has no place to put for things that could have been observed had things turned out differently, but which didn't actually happen.

Sure it does: you have to calculate the probability of your data given the hypothesis. Doing that requires considering all possible outcomes of the hypothesis and their relative likelihood, not just the one you actually observed.

> If your prior was that a fraction p of the children would be boys, the odds of the observed outcome would be p^6 (1-p).*

The prior would not actually be a single value for p; it would be a distribution for p over the range (0, 1). The distribution I described was a narrowly peaked Gaussian around p = 0.5, though, as you point out, that might not be the correct value for the peak (see below). However, for illustration purposes, it is much easier to talk about the (idealized, unrealistic) case where your prior is in fact a single point value for p.

However, in order to calculate the odds of the observed outcome, as I said above, you don't just need to know the prior for p. You need to know the process by which the outcomes are generated, according to the hypothesis. The odds you give assume that that process is "bear seven children, regardless of their gender". But that is not the correct process for the actual decision procedure you describe your aunt and uncle as using. That process won't necessarily result in seven children, and the odds of the actually observed outcome will change accordingly.

> a Bayesian with access to actual population data would be aware, as you aren't, that globally we average 1.07 boys to each girl at birth

Depends on whose data you look at and over what time period. But I agree that the best prior to use in a given case would be whatever distribution you get from the data you already have, and yes, that might not be peaked exactly at 50-50.

"globally we average 1.07 boys to each girl at birth. Therefore most couples, likely including my aunt and uncle, were probably biased towards having boys."

Specifically, Chinese boys.

Specifically, Chinese boys.

You have a point.

US statistics are 1.05 boys to each girl at birth. And that figure has been fairly stable for decades.

Which means that my point remains. Most couples are biased towards boys over girls.

> Most couples are biased towards boys over girls.

Yes, but the Bayesian argument shows that you can't infer that from your one sample. You only know that there is a bias towards boys because you have the global data that allows you to adjust the Bayesian prior to be peaked around the actual observed ratio instead of around 0.5. The Bayesian prior is still a much better prediction for any other case not yet observed than any value different from the prior that you might calculate from the data from just your aunt and uncle.