Can someone help me understand what odds like this mean in the context of an election?
The model says that Trump has a 1 in 10 chance of winning. With a fair 10-sided die it makes sense that you have a 1 in 10 chance of any given side rolling face up. But what is the die that is being rolled in these election statistics? What is the "chance" element that is being predicted?
You're comparing two scenarios, one in which you know all the facts, and one in which you don't.
In the dice toss scenario, we know everything relevant. In the election scenario, we don't.
A model like this is attempting to say "these are the rules we think exist. Based on the rules, and assuming the data is off by some random distribution, here's what we think could happen".
What different forecasters disagree about is what the rules are. For example, the relevance of certain demographic characteristics and the potential variance between polling (conducted prior to the election) and actual election results.
There's a huge amount of assumptions, and forecasters disagree on those assumptions. We have very little historical data (polling is very recent) and even with complete historical data, future elections do not always conform to past elections.
I will veer this off into the dreaded political territory even though this is mostly a technical discussion.
The Democratic Party proved it was not as progressive as they thought as Sanders lost the primary. The reality is, the country as a whole is also not as liberal either, regardless of what these pollsters are asking people. You think the party is youthful, and ready for progressive ideas, but alas, the party wholly rejected an amazingly progressive candidate in Sanders. You think everyone’s super pissed at Coronavirus handling, and police brutality, healthcare, but alas, you find out people associate BLM protests with crime, and the virus with China, and socialism with unfair wealth redistribution. We can keep learning this the hard way I guess, this is America after all.
It’s important the technical discussions are happening this time around, because there was virtually none the last time. The post mortems for these forecasts being wrong again should be a death knell for accumulating bad data. I’m certain the models are good, but I’m not certain the data is.
Anyway, if you want my hot take, the conditional forecasting is to save their ass on election night from being embarrassingly wrong again. Imagine writing a giant if-statement that looked something like ‘and if(imWrong) changeMyAnswer’.
> Anyway, if you want my hot take, the conditional forecasting is to save their ass on election night from being embarrassingly wrong again.
Well Nate Silver wrote a full critically acclaimed book about why these types of forecast are more useful (and accurate) in reality because they account for uncertainty - he has been doing this for years, ever since he used to write similar algorithms to help bookies pick odds for sporting events, so I think your hot take isn’t based in any world of facts or knowledge on this.
Don’t trust a forecaster that says with certainty that a certain candidate will win, unless they have also bet their life’s earnings on it. Showing your statistical confidence level isn’t a bad thing.
I think it’s certainly more grounded in reality if you realize 538 is basically finished if they miss the mark again.
If you listen to what they say, they admit they were not able to measure for the no-colllege male demographic in 2016, or in other words, they couldn’t model identity politics. Why couldn’t they do that? I’m not sure, but they are certain they can this time around because they saw the 2016 data and now believe they have more complete data to not make the same mistake again.
They are looking at elections as if there are hundreds of millions of elections that happen every day and the data speaks for itself. No sorry, there’s very few elections to extrapolate the way they are doing it, and you really need to do sociopolitical analysis of things like a demographic identity bloc (no-college whites that feel some way about things) that really get you the accurate undercurrents that can sway an election.
Lastly, it doesn’t take a genius to sit there at 10pm on election night and go ‘well if Florida and Michigan went this way, then probably so will these other states in flux’. ‘Our forecast becomes more accurate as we get the actual poll closing numbers on election night’, ah I see, you’re all geniuses, I should have known.
> If you listen to what they say, they admit they were not able to measure for the no-colllege male demographic in 2016, or in other words, they couldn’t model identity politics. Why couldn’t they do that? I’m not sure,
You seem to have a fundamental misunderstanding of what FiveThirtyEight is trying to model, versus what pollsters are trying to model with the numbers they publish that FiveThirtyEight consumes. The kind of demographic weighting you're complaining about FiveThirtyEight being bad at is something the pollsters do, and is outside the scope of FiveThirtyEight's forecasting models.
> If you listen to what they say, they admit they were not able to measure for the no-colllege male demographic in 2016, or in other words, they couldn’t model identity politics. Why couldn’t they do that? I’m not sure, but they are certain they can this time around because they saw the 2016 data and now believe they have more complete data to not make the same mistake again.
I think you possibly misunderstand what 538 _do_ a bit. Their data is based on polling, so they can only work on what the pollsters do. Historically, pollsters didn't pay that much attention to education, beyond using income or class as a proxy for it; one middle-class white man was pretty much like another. This worked quite well historically, but no longer does (and it's not just a US phenomenon; it was also a contributor to polling problems for Brexit, notably).
In their current model, 538 assume a higher rate of uncertainty than last time round; also, some pollsters now model education. But really there's not that much they can do about stuff that pollsters don't ask about.
> I think it’s certainly more grounded in reality if you realize 538 is basically finished if they miss the mark again.
What does missing the mark mean though? In 2016 they proposed a c30% chance that Donald would win, and a 70% chance Hillary would win. Does that mean they were wrong? Not really, because that's how probabilistic forecasting works - and they stated their confidence interval - they were 70% confident that Hillary would win, but thought there was a 30% chance Donald would win.
> The Democratic Party proved it was not as progressive as they thought as Sanders lost the primary.
The FiveThirtyEight forecast for the Democratic primary [1] gave Biden the highest chance of winning for most of the process. He did have a steep drop in the month before Super Tuesday (followed by an equally steep rebound), but still, I wouldn't say the forecast was especially bad. That said, polling is always worse for primaries than general elections, since there are more candidates and fewer voters.
This sounds like a frequentist vs Bayesian statistics discussion, which involves (this is a simplification by me, a non-expert in the area) different definitions of probability. The frequentist view is along the lines of rolling a 10 side die hundreds of times, recording the results, and determining that each side comes up equally. The Bayesian view is that the probability measures our certainty about some event. For example, take the hypothetical point in time where all ballots have been cast, but have not been counted. One could use polling data, etc to model the odds that a particular candidate has won. However, the frequentist approach doesn’t really make sense here, as the ground truth already exists (all ballots cast), so rerunning the the event doesn’t make sense.
Once again, I’m not an expert, so I recommend looking for additional explanations, if you’re interested.
> what is the die that is being rolled in these election statistics?
A program which randomly generates outcomes for each state, based on probability distributions inferred from the polls, and calculates who wins the election given those outcomes. They run the program repeatedly and report the proportion of simulated wins as the probability of winning. https://en.wikipedia.org/wiki/Monte_Carlo_method
It means if you saw all of these facts in ten different events, you would not be surprised to see a one to nine split in results. Ish. As you are scaling up, if course.
So, think of it as saying these facts basically describe a ten sided die. With no other knowledge, the best you have is that you expect it to behave the same as any other ten sided die.
It's less about pure random chance, and more about our uncertainty. Compare it to a weather forecast that says there's a 10% chance of rain tomorrow. In the same way that weather forecasts get better over time (better atmospheric measurements, more sophisticated computer models), we could potentially do more to measure what the outcome of the election will be. And it might be theoretically possible (albeit highly unrealistic) to predict it with complete accuracy, given enough data. But we're not in that situation, hence uncertainty.
(There are a couple of caveats about election forecasting as opposed to weather forecasting. The first is the "October surprise," a sudden revelation that changes the election. This cycle, it was arguably Trump's covid diagnosis, although that tended if anything to push the results further in the direction they seemed to be going on their own, rather than upset any trend. The second is that, unlike with weather systems, measuring voter behavior (and widespread reporting on these measures) can change people's behavior. The effect of this is hotly contested, but one of the many explanations of Trump's victory in 2016 which hinged on turnout in a few key states is that those states were predicted wins for Clinton, so Clinton voters didn't bother voting. Despite occasional jokes to the contrary, it doesn't rain just to spite the weatherman.)
That's a good question and it's not clear. As someone mentioned, here "chance" includes both uncertainty (facts that we don't know), and randomness of nature (things that will happen in the future that cannot be deterministically deduced from the state of the world today). Depending on your philosophy these may overlap. Next, someone mentioned Bayesian vs frequentist.
The frequentist interpretation is roughly that if I go around making my best possible predictions, and we lump together all the things that I predict at 10%, about 1 in 10 of those things happen and the rest don't. But I wouldn't be able to be more specific about which ones in that group are more likely than others.
The Bayesian interpretation is that I can really view the world as flipping coins -- I don't care whether it's due to my lack of knowledge or "true" randomness -- and as far as I can tell, the coin flip involved here is 1 in 10.
We can also use a gambling interpretation. Here's one based on security of python's random module. Imagine the following three lotteries I offer you. In lottery A, you get $100 if Trump is elected. In lottery B, you get $100 if the following python code returns true on my laptop:
random.random() <= 0.09999
In lottery C, you get $100 if this code returns true:
random.random() <= 0.10001
If you would rather have lottery A than B, and you'd rather have C than A, then in some sense that you believe Trump has a 1 in 10 chance.
Now there's an interesting extra layer to all of this because it's a model predicting, not a person. In a short space, I would basically say that we've trained models to predict in ways that are not inconsistent with any of the interpretations above, when put into situations where that is testable. Then we use them in situations where it might not be, like this.
It pretty much means nothing. These sort of models produced wrong result again and again. For me the biggest question mark if that if we know that recent (last 5) elections were very close how can you predict somebody winning with 93% chance? Maybe I do not understand something here.
Agree. Polls are a bad metric to rely on. Only 2% of those asked respond. Impossible to get a statistical sample with that. People are bad at predicting their future behavior. They are dishonest, they don't know or just don't want to tell you. There's a huge class divide right now. The bigger the class divide the worse polls are historically. And we know they are wrong this time around. None of the early voting margins predicted have held close. Better predictors are Google trends, rally participation, and voter registrations.
Sure, these are all concerns. However, as long as they are not systematic errors for or against one candidate, they end up not mattering very much.
Andrew Gelman (the author of this post) has also done a bunch of work on how different parties supporters become more/less likely to respond to polls based on what the current results are, which has been incorporated into the newer forecasts.
It's typical to use "to" rather than "in" when discussing odds. So the odds of getting a 1 on a 10-sided dice are 9 to 1 against (odds are also typically specified with the larger number first, because of the overlap between mathematical odds and betting-shop odds). And the probability of it happening is 1 in 10.
(I suspect that counter-pedantry on these lines might be part of why your post is getting downvoted; I wasn't one of the downvoters fwiw.)
Nit: Have they counted for the possibility of a tie? US elections allow for a tie in the Electoral College (which then kicks off a supremely strange and legalistic process).