Hacker News new | ask | show | jobs
by yk 2892 days ago
> And in any case, the model only generated probabilities of winning a game and advancing, and no team was given more than an 18.5 percent chance of winning the World Cup.

> [...]

> But Goldman Sach’s misfire is perhaps the most curious.

The model said, that there is a lot of uncertainty, and as it happens, it was entirely correct. A World Cup chance of 18.5 percent means, that 4 out of 5 times the team will not win, and that that is the highest chance does not say much about the model.

And in general this is one instance of the well practiced journalistic technique to wait for results first and then define a bar afterwards to criticize the results according to standards that did not exist when the performance happened. (I guess in this case it is even worse, we could construct a reasonable test of the model performed, I have the suspicion that that was in the original paper and that the journalist either did not understand it, or, more likely, choose to ignore it in favor of writing a better story.)

5 comments

Their model also had France at 2nd most likely, Belgium at 5th, and England at 7th. 3 of their top 7 made the Semi-Finals, and they called the eventual winner as Second Most Likely, and more likely than Germany. They actually predicted the Brazil/Belgium game in the Quarter Finals, but got the winner wrong. Brazil had 27 shots and 9 on target with 59% posession. Belgium only had three shots on target, and made two of them to win.

They overranked Germany, and underranked Croatia. Nearly every other person in the world did the same.

Look how disingenuous the Bloomberg article is. "Goldman Sachs updated the model throughout the tournament. It predicted a Brazil-Spain final on June 29 and Brazil-France on July 4. Its most recent prediction had England and Belgium squaring off for the cup. Both were eliminated in the semifinals." But their actual Brazil-France prediction had 8 teams left, and the winners of that round were all in the top 5. https://twitter.com/GoldmanSachs/statuses/101448576794142720... They even had Croatia over England, and France over Belgium.

> Brazil had 27 shots and 9 on target with 59% posession. Belgium only had three shots on target, and made two of them to win.

A modern model would accommodate for the fact that those numbers alone mean nothing, because they don't. Those are the numbers broadcasters reluctantly put on a screen for entertainment value, but they don't have real analytical power because they have no comparative metric.

How up or down were each of those numbers against previous wins and losses for each team?

What was Brazil's conversion from on-target shots before the tournament?

What was Belgium's success/failure rate on on-target shots they were defending against?

Likewise the other way around: were Brazil guilty of particularly poor defending? Were Belgium finding ways of making on-target shots count against all opposition, or was it luck on this game?

Any human analyst could tell you going into that game that Belgium were "lucky" and easily free scoring beyond expectations, able to make more of fewer opportunities. Likewise the consensus from most experts was that Brazil were guilty of mild complacency, the team were young and not yet formed into a strong unit yet (rather still just 11 strong individuals at any one point in time), and their on-target shots - whilst frequent - were of lower probability of being able to turn into goals due to distance, power, position, etc.

So why did the Bloomberg model not pick that up?

I actually think they did pretty well all things considering, but I'd love to see whether they did any runs on previous World cups to try and check their thinking and whether they over-fitted a little to a couple of key metrics. I think the lack of metrics from previous games might mean they relied on some headline numbers, but there's more that they could have done to get a better model here...

Still, it's not their job is it? Just a bit of fun... which is a good job, because I find it just a little bit amusing.

Some teams/coaches like possession, others do not. If a team plays a dominance based game, eventually, their defenders will be (almost) on the opponents half. When this happens, it becomes edgy, and a loss of possession can be punished by a counter. That counter needs to be executed as fast as possible. Teams that are ahead often retreat and let the opponent have the ball to be able to break out like that. It just means possession doesn't really say anything. Belgium went ahead against Brazil with a bit of luck, and then let Brazil have the ball. Belgium's second goal was a classic counter punch. After that Brazil was allowed to have the ball while Belgium tried to control the game. Regarding odds, Belgium was number 3 in the world when the game was played, Brazil was number 2. Obviously, it would not be a `walk over` for anyone.

If you look at both Belgium/England games, you see number 2 against number 12. The ranking was respected there.

https://www.fifa.com/fifa-world-ranking/ranking-table/men/in...

Used to be a silly ranking system, but it's elo based these days, so it's not too shabby.

Instead they mean, a lot. Shots on target is the proxy you have (except of course goals) to derive who team dominated more. As a matter of fact if you follow the sport, you will know most coaches will be satisfied if the shots on target is good, even if one particular game no goals are scored. The tragic thing for Brazil is that the WC is a short and direct elimination tournament. A bad game and you are gone.
Sure those are just basic stats and could be improved probably, but they do reflect the reality that Brazil should have won; they got unlucky with an own goal, and they made some key mistakes at critical times, failing to finish great chances.

You're not going to find a statistical approach that will account for the subtleties that led to this outcome. The problem with soccer stats in general is that everything hinges on low-frequency events based on subtle differences of timing and space.

Basketball by comparison is much more stat-rich, and there are a lot of cool advanced analytics, but even still they are full of gaps that are obvious to any expert watching the game. Afterwards maybe you can find the statistical signature of something you saw, but then you risk overfitting again, just the same as soccer.

> You're not going to find a statistical approach that will account for the subtleties that led to this outcome. The problem with soccer stats in general is that everything hinges on low-frequency events based on subtle differences of timing and space.

I think this deserves to be elaborated a bit: a game in which 1 is a good score, and often a game-winning score, is never going to be accurately predicted based on a statistical approach, because scoring is too rare for a statistical approach to work well. Low scores mean that individual games have an extremely large element of chance.

Imagine one team is about 4% better than another team; they should be favored about 51-49 to score a point. If a game scored 300 points, that difference would be perceptible within one game. But to resolve the same difference accurately in games that score 3 points each takes many, many, many games.

> The model said, that there is a lot of uncertainty, and as it happens, it was entirely correct. A World Cup chance of 18.5 percent means, that 4 out of 5 times the team will not win, and that that is the highest chance does not say much about the model.

But do you need a sophisticated model and lots of so-called "AI" to arrive at the conclusion that there's a lot of uncertainty?? The point of the model is to reduce uncertainty, not find that it's there and do nothing about it.

The point of the model is absolutely not to reduce uncertainty, it is to quantify it, which are two very different things. No model reduces uncertainty in a probabilistic sense.

And no, you don’t need statistics or machine learning to say “there is a lot of uncertainty”, but you do in order to quantify that uncertainty.

I think the right way to measure the correctness of the model is to compare it with various other predictions:

-Predictions from the general public

-Predictions from football experts

-Predictions from other mathematical models

For example: If over time, the new model is 5% better than the best of the old models, then it's very good.

Doesn't make much sense to compare it with reality and jump to the conclussion that the model doesn't work because no prediction can be 100% accurate.

Say I have two models - model A returns around 20% likelihood that the top team wins the world cup, and model B returns around 80% likelihood. I use both of the modeling techniques a few thousand times in various parallel universes, and both of them are exactly right - 20% of 20% predictions result in a win, and so on. Despite them both quantifying uncertainty accurately, isn't model B still better?
Think about the actual uncertainty since anything can happen in the game itself. The easiest way to look at this is to play the game 100 times in a row (preferably in parallel universes, as you say). If team A wins in 60% of games, then that caps the ability to predict the result. You can predict a die roll to be 6 with an certainty of 17%. You can’t do any better.
Say I have a bag of dice one of each of the usual D&D denominations (d4, d6, d8, d10, d12, d20). I draw one at random, ask the models for predictions, and roll it. Model A ignores the information about which one I drew, and predicts a correct distribution of rolls (12.9% chance of rolling a 6). Model B correctly processes the information about which one I drew, and predicts a correct distribution given that information (I drew the d6 so 17% chance of rolling a 6). Both models give correct results overall, but Model B has higher probabilities on average, and I would say it is a better model.

A model should be judged both on how accurately it characterizes its uncertainty and how much evidence it's able to successfully make use of.

You can do better if you have foreknowledge or retroactive foreknowledge of the outcome of the die roll, which is the obvious suggestion of jtolmar's comment. If I know the recorded outcomes of a sequence of die rolls, I can have models that predict those outcomes to any accuracy I want. But they're not doing it by measuring the uncertainty involved in prospectively rolling the die.
No, because if the underlying phenomenon happened 20% of the time, that’s what you want your model to predict. The point of the model is to describe reality as accurately as possible. So a model that predicts a particular outcome to happen 80% of the time, and the outcome actually does happen 80% of the time, isn’t any better or worse than a model that predicts an outcome to happen 20% of the time that happens 20% of the time.
Uncertainty is a truism; that's why people want to use a prediction algo. Did the system so better on results it was more certain about?

Predicting the result of an A or B contest the bar is already defined. Either the system gets it right or doesn't, if it gets it right more often than not then (despite this being poor grounds mathematically, on a small result pool) popular press will report it as successful.

IMO if matches become easy to predict then rules will change to reduce that predictability.

> Predicting the result of an A or B contest the bar is already defined.

I disagree: If team A has a 10-30% chance of winning, and A pulls off the upset, the correct answer was not "A Wins" it was "B has a 70-90% chance of winning".

For Goldman Sachs' investments, the bar is not to predict that A wins or that B wins, it's to predict the probability and variance regarding which team will win. Of course, from a single upset game, it's impossible to tell whether these estimates are correct. You'd need to see the success or failure of many trials.

The problem is that the 2018 World Cup is not a repeatable event. Neither are most open-market trades (presumably the point of this whole PR stunt being to show that their quants are good at making smart bets in the markets) but they're a LOT closer.

Soccer is a pretty data-poor environment, or at least was historically. Before movement trackers, there was very little data to play with. With movement tracking data slowly building up, I suspect that soccer analytics will soon have their "Moneyball" moment the way baseball did.

The reason baseball got there sooner is that, even without advanced player movement tracking, baseball is a data rich environment. There are ~2500 MLB games played per year in the the 30-team era, and we have at least box scores going back to the late 19th century for most professional games, and pitch-by-pitch data going back to the eighties. In addition, a lot of the most important data is cleaner in nature (pitcher-batter match-ups) and also abundant (compare ~200 pitches in a baseball game to ~15 shots on goal in a soccer game, to take a guess at the order of magnitude).

Computing power can help squeeze more information from the soccer data we collect going forward, but there is a century or more of player tracking data that we can just never ever have, since it wasn't being collected. We know Babe Ruth's batting line but we will never have the soccer equivalent of UZR for Pele. I don't know if there is a retrosheet-equivalent effort for soccer to collect stats from old film, but that would be one way to partially bridge the gap.

> The problem is that the 2018 World Cup is not a repeatable event. Neither are most open-market trades...

The 2018 World Cup is not a repeatable event, Elon Musk buying $10M of Tesla shares is not a repeatable event, and Donald Trump winning the 2016 presidential election is not a repeatable event. Therefore, to meaningfully discuss any of these in the context of probabilities and confidence intervals, we must assume that we generalize them to any soccer game, a stock purchase, or an election, and can do this meaningfully by adjusting our priors. It does make the mathematics a lot less pure.

wasn't leicester city a "moneyball" team? a zero-to-hero club with a roster of modest salaried players who have statistical synergy? i don't follow much premier league but from what i remember hearing about it, they bucked a trend of spending tens/hundreds of millions for megastars to solo carry the team
There are many premier league teams doing a lot more than Leicester when it comes to statistical analysis.

They did indeed win the league with a budget far below many of the normal contenders, but it was a mixture of good management, luck, a few players having the breakout seasons of their careers which took them to the point where only big teams can now afford them, and a few other players having great runs of form that saw them playing better than they would before or after.

Despite the elements of luck, it was an incredible achievement. But the following season they were back to being a team with no realistic chance of competing for the title, and were actually in a relegation fight to stay in the top division.

Literally no data producing phenomenon is a repeatable event, outside of controlled experiments.
Model totally sucked against betting odds and if you used the model probabilities to price bets you would have lost a lot of money vs even an average bookmaker.

Score it yourself against implied probabilities from Betfair for example and marvel at the suckage.

But Goldman Sachs are the kings of predicting uncertainty! This is their whole business! They make billions predicting certainty through the murky, uncertain waters of the global economy. Would you argue that the global economy is more uncertain that soccer? I'd say so. How is it that they can find success in the market but not in soccer?

I think this is a smoke signal. Soccer is corrupt; you can't predict the winner unless you know what's being passed around under the table. Goldman Sachs does these predictions so people read between the lines to see how corrupt it is.

My argument is: "Goldman is amazing at statistical analysis and they routinely practice it on much tougher models (the global economy), so they should have no problem predicting a simpler model (soccer). But since they drastically failed at predicting soccer, then there must be an equally drastic variable missing from their predictions. Since we can trust Goldman to use all available public information in their analysis, there must be critical information that is hidden from the public which affects the outcomes". I make some assumptions, but it's fairly sound, no?

Goldman's business model is not to predict the future. Goldman has 2 business models: 1) transfer risk, 2) provide advice. For #1, it's a middleman. For #2, it's paid for brain power, experience and speed.
Unclear if your comment is tongue in cheek, but assuming that you're serious, I'd encourage you to give a listen to a podcast episode like this: https://soundcloud.com/bettheprocess/episode-35-ted-knutson.

In the world of sports betting/analytics, you have baseball and basketball at the forefront, and then American football, soccer, and hockey (roughly in that order).

Off the top of my head, there are several reasons why the latter three sports have all lagged behind:

-Lack of data

It wasn't until the last 4-5 years that widely available, affordable, and accurate data for soccer matches was available. Companies like Opta have accomplished this by outsourcing the watching of games and the manual tagging of events, which was made possible by the advent of cheap cloud computing.

It should be self-evident why tracking the position and actions of 22 players is more complicated than something like baseball, where for the most part you are looking at one pitcher vs. one batter, much of which can be automated with computer vision that tracks pitch position, speed, and spin.

-Complexity

It's no accident that baseball was the first sport to be revolutionized by analytics. Most of the time, it's a static game, with a clearly defined action set. I.e. do I swing at the pitch or not. Do I throw a fastball or not. Do I attempt to steal a base or not.

In games like American football, soccer, and hockey, you have anywhere from 12-22 players on the field at a time. Tracking what the players without the ball or the puck are doing is a difficult task technically, as is quantifying their impact. Concepts like expected goals and expected goals added are recent ones.

-Sample size

Typical elite soccer leagues see each team play each other twice. In England and Spain, this means you have 38 games per season.

Baseball has a 162 game season and playoff games, basketball has an 82 game season and playoff games, etc. Coupled with the fact that quality data has been only collected for a few years, and you get other problems.

In basketball and baseball, the effects of aging on player performance and statistics is fairly well understood now. We can generally calculate the 5-year market value of a player etc. In the other sports I mentioned, we don't yet have that kind of time series data to be able to make those judgements.

--

Specific to the World Cup, there are other reasons why you may find it hard to predict results.

-Team chemistry and style

Even though the World Cup is the most high-profile soccer event in the world, most players are spending 1-3 months a year with their national teams. Their "day jobs" with their clubs teams take up most of their playing time and attention.

As anyone who has played the game Football Manager will know, managing a national team is a tough job. You have no say over how the players are practicing when they're away from you, and no control over the physical condition in which they arrive at the World Cup. This year, there was barely a month between the end of the regular European seasons and the start of the World Cup.

In that month's time, you have to get at least 11 players who have not played with each other, to learn your style of play. Do you want to play a pressing style? Are you attempting a slow buildup, or trying long balls? Etc. etc.

-Home field advantage

In baseball and basketball, most modern statistical models account for home field advantage. Having 60,000 Russian fans chanting and heckling likely played a role in the team's ability to upset Spain, particularly during penalty kicks.

This goes back to the sample issue. How many times before have Spain played Russia IN Russia in front of a large crowd? Probably never.

---

All this is to say, cut Goldman some slack. There are a number of non-nefarious reasons why you may expect a soccer model to produce some spectacular miscues.

On top of all that, as a low-scoring game, soccer is inherently more random, and therefore harder to predict.
Ok, I understand this - that soccer has many variables and it is difficult to create a model with all of these variables. But my point is, the global economy has way more variables than soccer. Way way way way more variables. At least 7.5 billion of them.

So would you argue that creating a statistical model of soccer is harder than creating one for global economies? I think it's harder to model economies.

I'm not even trying to give Goldman a hard time! I'm saying that Goldman probably put together a very accurate model of "soccer", but we aren't watching an accurate model of soccer; we're watching the corrupted one where the players and skills don't matter.

I think we have to be very clear on what economic "models" Goldman uses.

If you're talking about GDP growth forecasting, or forecasting unemployment numbers, these are ultimately questions of aggregation. Yes, there are 7.5 billion people, but at the end of the day each individual agent's actions don't make a tremendous difference for an aggregate measure like GDP. During periods of low volatility, as we are currently experiencing, it's really not all that impressive to forecast the unemployment rate +/- 0.25%, or GDP growth within 0.5%.

If you're taking about their market-making and trading businesses, they've had some horrendous quarters recently as well (http://www.businessinsider.com/goldman-sachs-just-had-a-hist...). A very small portion of Goldman's business is taking an opinionated stance, most of their income comes through relatively low-risk market making activities.

And let's not forget that during the 2008 financial crisis, certain departments within the company correctly wagered against credit default swaps, while others had exposure to subprime mortgages. The company still needed an injection of capital from Warren Buffett and the US Treasury to weather the crisis. Point being, they aren't clairvoyant oracles.

---

Regarding your last point, which was also made in your original comment, you seem to be claiming some form of what economists call "omitted variable bias", and seem to be hypothesizing that the "omitted variable" is corruption or cheating.

From the purely technical standpoint of building models, the tiny samples (https://www.theringer.com/soccer/2018/7/11/17557720/world-cu...) and the nature of the "data" being collected means that there are plenty of other explanations, like incorrectly estimated parameters or measurement error.

If you're trying to suggest that there is corruption or cheating in soccer, please point to a concrete example of a team in a critical game receiving a disproportionate number of calls. Unsure if you're aware, but this was the first World Cup with instant video replays for the referees to use. Had this replay been in use more widely in international soccer, the US might've qualified for this World Cup (https://deadspin.com/u-s-a-out-of-world-cup-on-phantom-goal-...), England might've won/tied that pivotal 2010 World Cup game (https://en.wikipedia.org/wiki/Ghost_goal#England_v_Germany_a...), etc.

Soccer may have had a sordid past with the picking of host countries, but the trends in the actual game itself point to technology reducing the ability of referees to make blatantly terrible calls.

Thanks for the replies and the detailed sources, it's interesting to read!

> Point being, they aren't clairvoyant oracles.

Yeah, my argument was weak in that regard. They aren't anywhere close to perfect or accurate, I'll admit.

> you seem to be claiming some form of what economists call "omitted variable bias"

Yes! Is that what it's called?

> please point to a concrete example of a team in a critical game receiving a disproportionate number of calls

Corruption doesn't have to be that explicit. Maybe key players or coaches are paid to perform poorly? It doesn't always come down to the ref. But I admit I have no examples.

> you can't predict the winner unless you know what's being passed around under the table

Pretty sure you just inadvertently identified why GS is so “great” at predicting economic movements.