Hacker News new | ask | show | jobs
by matthewmacleod 2131 days ago
Let's say we have two candidates A and B for some kind of election. We analyse our polling and other data, and estimate there is a 90% chance of candidate A winning and a 10% chance of candidate B winning.

If candidate B then wins, it does not mean that our analysis was "proved to be [in]correct". By itself, it doesn't actually say anything about the quality of our analysis. After all, we explicitly pointed out that this was a possibility, and it would be strange to argue "your analysis said this might happen, and then it did, so your analysis was incorrect". There's just not enough information to draw any conclusions.

2 comments

It doesn't statistically. But if I say something has a 1 in a thousand chance of happening, and you say something has a 40% chance of happening and it happens... people will rightly say that your analysis was more correct than mine. Now maybe I was right and just got monumentally unlucky for unknowable factors. But that's certainly not the way people will think about it.
If one candidate won 1000 consecutive polls would it also tell you nothing about your estimate? This is obviously absurd: of course it would.

How about 100 times? 10 times? 2 times?

At what point does evidence cease to 'say anything about the quality of our analysis'? The answer is never. Every datapoint can be used to update your priors according to bayesian statistics.

As I said in my other comment that you chose to ignore, the probability of winning national popular vote does not indicate the probability of who will be President.
It's not that I chose to ignore you. I just didn't find your comment interesting from the statistical perspective.
You don't find aggregation of regional deviation to national deviation interesting from a statistical perspective? Odd.

Especially for someone who titles them-self "Chief Scientist"

This is hacker news: I'm allowed to be more interested in bayesian statistics than your country's electoral system.
>This is hacker news: I'm allowed to be more interested in bayesian statistics than your country's electoral system.

Aren't regional probabilities just conditions on the national distribution? Building the national Presidential election distribution from the conditioned distributions versus just using the sampled national distribution is bayesian statistics.

>The probability of winning is based on the popular support for a candidate

This was the argument you made. And I don't see how this is Bayesian vs frequency when all I am saying is that the same national popular vote distribution can have large variance in regional conditional distributions which leads to large variance in election outcome distribution due to the electoral college.

Sounds pretty bayesian to me.