Hacker News new | ask | show | jobs
by vitus 2064 days ago
One thing to note: 538 uses a t-distribution (and calls out on regularly in their podcast that this yields much heavier tails than a normal distribution). Even 40,000 samples is not enough to characterize the tails.

So, it seems to me that the entire article is predicated on a faulty conjecture, namely that 538 uses a mixture of a normal distribution with an independent heavy-tailed one. (It's not explicitly stated what the author thinks the base model is, but I think "normal" is a reasonable guess.)

I'd be interested in seeing a reverse-engineering analysis of 538's choice of distribution parameters, and extrapolation from there to see if these pathologies still arise with (much) larger samples.

...

That said, ultimately, the choice of how fat to make the tails is a modeling decision, and how the models behave outside the regime of interest isn't as important as how they behave within the operating region. There are key ways we can evaluate goodness of fit once we have results (e.g. bias, MSE) which we can use to determine just how wrong the model was as a predictor, and chances are pretty good that we won't see, say, Trump winning NJ, so we won't actually be able to validate the tail correlation with the vote in PA. But we will be able to validate the correlation in margin between PA and NJ.

Maybe 538's tails are too fat, and every prediction in the 80-95% range ends up going as predicted. Or maybe they're not fat enough, and some races in the 99% bucket end up going the opposite way. Point is, we won't know for sure which models were the best predictors until we can verify the predictions.

(see: all models are wrong, etc. Newtonian mechanics work great as long as your objects are big and slow, for instance.)