Hacker News new | ask | show | jobs
by jzl 2671 days ago
Cool article. My knowledge of statistics is really rusty, but isn't this another way approaching the topic of "Bayesian Thinking"? If you think about the scenarios in the article from the standpoint of predicting any given outcome in advance, male vs. female and hard department vs. easy department should be treated as "priors". Or to put it another way, Bayesian thinking means asking the question "What is the chance of X happening given Y?"

A nice intro to the topic: https://betterexplained.com/articles/an-intuitive-and-short-...

Which explains why a positive test on a mammogram means you only have an 8% chance of having breast cancer:

>The chance of getting a real, positive result is .008. The chance of getting any type of positive result is the chance of a true positive plus the chance of a false positive (.008 + 0.09504 = .10304).

>So, our chance of cancer is .008/.10304 = 0.0776, or about 7.8%.

>Interesting — a positive mammogram only means you have a 7.8% chance of cancer, rather than 80% (the supposed accuracy of the test). It might seem strange at first but it makes sense: the test gives a false positive 9.6% of the time (quite high), so there will be many false positives in a given population. For a rare disease, most of the positive test results will be wrong.

2 comments

This is actually a case that shows the limits of Bayesian thinking.

The power of probability is that it can work in two directions. You can use it to make predictions, from causes to effects, from past to future. Or you can use it to reason diagnostically, from effects to causes, like deducing what must have happened in the past to produce the current observation. Thinking probabilistically, these two cases are treated the same: they're both just conditioning on evidence, which is really elegant.

The problem is that when the two cases really need to be treated differently, probability can't distinguish between them. For example, asking about the probability of hypothetical situations, or predicting the results of interventions. You need to know which variables are causes and which are effects, but this is outside the scope of probability.

Simpson's paradox is something that only shows up when the variables involved have certain cause-effect structures. If you think in terms of these structures, it stops being counterintuitive.

This is more about knowing what the right question to ask is, which is trickier than expected. In the classic example, the people who brought the lawsuit asked “what are the odds of getting into Berkeley if you are a woman?” However, if people don’t apply to “Berkeley” but instead to “Berkeley’s College of Engineering”, then the right question is “what are the odds of getting into Berkeley’s college of engineering if you’re a woman”. The paradox is due to the fact that we expect the answers to be the same.

And all of this, of course, ignores sampling bias…