Hacker News new | ask | show | jobs
by an_opabinia 2064 days ago
Andrew Gelman designed the 538 model in 2007.

Nate Silver authored an adjustment to polls used in that model. Polls have more impact if they are more representative of statewide turnout among demographic things he chose like “black” and “low income.” This is why his predictions were so accurate for Obama’s 2008 and 2012 elections, and likely why they were so inaccurate in 2016.

Gelman’s own grad student is the only person to have academically published this approach, in a paper about polling Xbox Live users.

These guys sort of make a thing that is the same in many more ways than it is different. Why not just share the code is the biggest question?

3 comments

Nate Silver said Trump had a 1 in 3 chance, which basically means one shouldn’t be surprised no matter the result. I’m not sure where this “all the polls were so far off in 2016!!” narrative comes from, but it’s wrong.
It comes from innumerate journalism, and an innumerate population. Next time someone laughs off being bad at math, you should point out that being unable to read is no laughing matter, and being unable to understand numbers shouldn't be either.

The only sensible way to predict probabilities that aren't extreme is to tell people how the model works and the figures it is currently spitting out. That's is the great thing about these kinds of blog posts, people are kicking the tyres, not just looking at the car.

Nobody predicting a one-off election with a rather special candidate would summarize a 33% chance as equivalent to having no chance.

> Next time someone laughs off being bad at math, you should point out that being unable to read is no laughing matter, and being unable to understand numbers shouldn't be either.

You're not wrong, but you should not do this.

The narrative comes from the medias inaccurate and misleading coverage of the polls in 2016. Many news outlets all but declared Clinton president before the election.
But the media is not Nate Silver. He said Trump had about as much chance of winning as the Cubs had that year of wining the World Series, and obviously both happened.

Silver did a nice writeup of the whole experience: https://fivethirtyeight.com/features/the-real-story-of-2016/

That isn't the only thing that happened. Probably the 1 in 3 odds were too low given the data available, because the polling demographics were not adjusted for education. If you randomly sample 1000 people to represent several millions, you also collect demographic information to ensure that you properly weight the responses based on how skewed that demographic is in your sample compared to the total voting population. In 2016 they weren't correcting for education, which turned out to be a huge hidden variable. This is explained quite well by 538 themselves: https://fivethirtyeight.com/videos/polling-101-what-happened...
And in particular any claim 538 was the site that was off the mark compared to other prediction sites is clearly based in a reality that is not shared with the rest of us. In the week before the election Nate and crew were posting articles specifically outlining the non-zero probability of a Trump win and if it happened how it was likely to happen.
Nate was the outlier in that respect. But it’s true that the polls aren’t weren’t all that inaccurate in 2016: a bunch of important swing states were within the margin of error and Trump won some important states by very small margins.

The mistake in 2016, IMO was a) the extrapolation that came from those polls and b) people paying way too much attention to national polls, which have very little connection to electoral outcomes, given the electoral college.

Also perhaps c) the larger public not “getting” statistics in the way they’ve been presented. The NYT had, if I recall, Clinton at 90% chance of winning. That still means that in one of every ten flips of a coin is a Trump win. But people read “90% chance” as “definite win”. I don’t actually know what anyone should or could do about that.

What Nate Silver got right in 2016 were the correlations in the Rust Belt, which were traditionally considered Democrat. 538's model predicted that losing one of those states would likely mean losing all of them for Clinton, because for example the polling errors were likely correlated. And indeed losing there is what cost her the election
Silver's 2012 book "The Signal and the Noise" discusses our inability to rationally process probability, pointing out that commercial weather forecasts (e.g. Accuweather) never list a probably of rain under 20-25%. A 5% chance of rain is a mathematical possibility but people "feel" like 5% = "will never happen" and get angry if it rains.

Nobel Prize winner Daniel Kahneman's life's work is about this, what he calls "System 1" and "System 2" of our brain, where System 1 is a fast responder that provides insta-feedback but is largely incapable of processing mathematical inputs. His 2011 book "Thinking Fast and Slow" summarizes his work well.

I'm not sure popular media can be trained to frame statistical probabilities in a way that doesn't provide people with the certainty they crave. But who knows?

I think it’s a confusion between the likelihood of winning, no matter by how many votes, and the predicted percentage of votes per candidate. The latter is more commonly presented to readers from polls. So it’s not too surprising if it gets mixed up with the former, which is used by Nate et al and uses also percent as the unit.

Say a national poll predicts 55% of votes for Clinton, 40% for Trump. Whereas 538 predicts 70% chance of winning for Clinton and 30% for Trump. It’s easy to confuse the two and think the second prediction is much better for Clinton when it might be much worse.

> But people read “90% chance” as “definite win”. I don’t actually know what anyone should or could do about that.

538 are aware of the problem, and combating it with a cartoon fox (and better visualisations).

I don't know if this statement is serious. How does a cartoon fox make me think differently about the numbers?
Like the coyote in El Viaje Misterioso de Nuestro Jomer, the cartoon fox tells you only just enough to light the path towards statistical enlightenment. You must walk it yourself.
I was mostly joking; the real improvement is the better visualisations. The cartoon fox is just there as a reminder.
It was more like 1 in 5.
It's still up:

https://projects.fivethirtyeight.com/2016-election-forecast/

28.6% is between 1/3 and 1/4, definitely not 1/5.

> Polls have more impact if they are more representative of statewide turnout among demographic things he chose like “black” and “low income.” This is why his predictions were so accurate for Obama’s 2008 and 2012 elections

I find this argument strange, because black turnout was unusually high in 2008. That should have a negative impact on the accuracy of statistical adjustments, not a positive one.

I think he made an estimate for the increase in black turnout. If I were designing the model, and I believed turnout is the biggest factor (maybe inconclusive among political scientists), I would look at the circumstances where turnout changes based on candidate's demographics and validate it across statewide and congressional races.

However, we will never know, because they never published the code.

> I think he made an estimate for the increase in black turnout.

I think that kind of adjustment is usually the responsibility of the pollsters, with their likely voter models. I don't think FiveThirtyEight directly tries to also apply such an adjustment, because that would be at serious risk of overcorrecting.

Similarly, this year many pollsters have added level of education as a factor to their demographic weighting, to address a shortcoming in their 2016 performance. FiveThirtyEight consumes those poll numbers without adding their own layer of demographic adjustment.

I think polls being more representative of turnout amongst minorities could help indicate a potential black swan event for the election. If turnout does return to 2008 and 2012 election levels, polls featured in this fivethirtyeight article [1] indicate Trump is performing better amongst black and hispanic voters. Both demographics are seeing a 10-15% swing in support for Trump compared to 2016, which could theoretically cement swing states like Florida, Pennsylvania, and Michigan.

I don't think it's likely but if those polls are indicative of what's actually happening, we're talking about potentially a 2-4 million vote swing in Trump's favor. Here's a link to estimates of voter turnout in 2016 [2].

[1] https://fivethirtyeight.com/features/trump-is-losing-ground-... [2] https://www.pewresearch.org/fact-tank/2017/05/12/black-voter...

You're misreading or confused or something. The 538 piece points to a ~10% swing away from the (in the case of black voters) 82% that favored Hillary. This is very different from a swing all the way to a 10% preference for Trump. A bigger turnout by (these) minority voters, assuming they cast votes even vaguely in line with this polling, is more votes in Biden's column than Trump's. That's bad news for Trump.
I think you're misreading my statement, it's a 10% swing in the direction of Trump, not a 10% overall preference for Trump. From 82% favoring Hillary to 71% favoring Biden for black voters. That's a 10% change towards Trump's direction. If 16 million black voters participated in 2016 then that's around 13.1 million votes for Hillary and 2.9 million for Trump. If polling is correct this year and we see around similar turnout (not even an increase), then it'll be around 11.3 million votes for Biden and 4.7 million for Trump. So a 1.8 million vote swing in the black vote.

That's just a really rough calculation and doesn't account for the Hispanic vote either in that article.

Apologies, my above explanation is actually wrong. The article is actually referencing the margin of the candidate. So it was a 82 point margin between Clinton and Trump in 2016, which is still difficult to interpret because it doesn't mention if this includes 3rd party votes. But assuming 98% of voters voted Clinton or Trump, this would mean that 90% of black votes went to Clinton and 8% went to Trump in 2016.

This would then mean that if 98% vote Trump or Biden in 2020, we'll see something like 84% of the black vote for Biden and 13% of the black vote for Trump. A 5% overall change using 2016 voter participation numbers is still somewhere around 700,000 vote change in the black vote, which is certainly not insignificant. Adding in the change in the Hispanic vote (a margin change from 37 points in 2016 to 23 this year), this could certainly change swing state outcomes.

But unless I'm misunderstanding something, the change in approval is already factored into the current predictions. If there is a greater than expected turnout by a group of people voting - in aggregate - more for Biden than for Trump, then that favors Biden relative to the current predictions. So that's not going to be a reason for a surprise in the other direction.

There are plenty of reasons our predictions might not match reality, but they're not going to be wrong in that direction for that reason.

It wouldn’t be a black swan event, it would simply be a variable that got re-toggled on. As in, there was a large black turn out for Obama, and there wasn’t for Hilary. What if we turned that variable back on to true for Biden? That’s about all the rocket science involved.

We’ve seen that variable before.