| This is a shoddy analysis. 1. There's a statistical gadget specifically for doing this—a "scoring rule" [1] which is a principled way to compare different probabilistic predictions. A bunch of scatterplots of random quantities against each other are... not that. By comparing only binary win/loss predictions instead of probabilities, like in the first chart, you throw away almost all information contained in the probabilistic estimates—if Democrats win a state, there's no bonus for predicting (say) 95% Dem instead of 55% dem. It's plausible that 538 would actually win under a proper scoring rule, because betting markets were underconfident (relative to 538) in deep dem/rep states (predicting e.g. <95% Dem win in VT, vs 538's >99%). [2] 2. The calibration analysis assumes that different state win/loss rates are independent, but that's really untrue: 538's predictions were specifically not independent because they assumed polling errors were correlated between states. 3. Many of the other scatterplots look outlier-driven and don't include r^2 or p-values. With so few datapoints, it's unclear if they are meaningful at all. [1]: https://en.wikipedia.org/wiki/Scoring_rule [2]: Maybe we should cut prediction markets some slack here because liquidity constraints make them inaccurate for small probabilities. If that's the article's position, though, they should address this instead of just... not using a scoring rule. |