Hacker News new | ask | show | jobs
by PaulRobinson 2888 days ago
> Brazil had 27 shots and 9 on target with 59% posession. Belgium only had three shots on target, and made two of them to win.

A modern model would accommodate for the fact that those numbers alone mean nothing, because they don't. Those are the numbers broadcasters reluctantly put on a screen for entertainment value, but they don't have real analytical power because they have no comparative metric.

How up or down were each of those numbers against previous wins and losses for each team?

What was Brazil's conversion from on-target shots before the tournament?

What was Belgium's success/failure rate on on-target shots they were defending against?

Likewise the other way around: were Brazil guilty of particularly poor defending? Were Belgium finding ways of making on-target shots count against all opposition, or was it luck on this game?

Any human analyst could tell you going into that game that Belgium were "lucky" and easily free scoring beyond expectations, able to make more of fewer opportunities. Likewise the consensus from most experts was that Brazil were guilty of mild complacency, the team were young and not yet formed into a strong unit yet (rather still just 11 strong individuals at any one point in time), and their on-target shots - whilst frequent - were of lower probability of being able to turn into goals due to distance, power, position, etc.

So why did the Bloomberg model not pick that up?

I actually think they did pretty well all things considering, but I'd love to see whether they did any runs on previous World cups to try and check their thinking and whether they over-fitted a little to a couple of key metrics. I think the lack of metrics from previous games might mean they relied on some headline numbers, but there's more that they could have done to get a better model here...

Still, it's not their job is it? Just a bit of fun... which is a good job, because I find it just a little bit amusing.

3 comments

Some teams/coaches like possession, others do not. If a team plays a dominance based game, eventually, their defenders will be (almost) on the opponents half. When this happens, it becomes edgy, and a loss of possession can be punished by a counter. That counter needs to be executed as fast as possible. Teams that are ahead often retreat and let the opponent have the ball to be able to break out like that. It just means possession doesn't really say anything. Belgium went ahead against Brazil with a bit of luck, and then let Brazil have the ball. Belgium's second goal was a classic counter punch. After that Brazil was allowed to have the ball while Belgium tried to control the game. Regarding odds, Belgium was number 3 in the world when the game was played, Brazil was number 2. Obviously, it would not be a `walk over` for anyone.

If you look at both Belgium/England games, you see number 2 against number 12. The ranking was respected there.

https://www.fifa.com/fifa-world-ranking/ranking-table/men/in...

Used to be a silly ranking system, but it's elo based these days, so it's not too shabby.

Instead they mean, a lot. Shots on target is the proxy you have (except of course goals) to derive who team dominated more. As a matter of fact if you follow the sport, you will know most coaches will be satisfied if the shots on target is good, even if one particular game no goals are scored. The tragic thing for Brazil is that the WC is a short and direct elimination tournament. A bad game and you are gone.
Sure those are just basic stats and could be improved probably, but they do reflect the reality that Brazil should have won; they got unlucky with an own goal, and they made some key mistakes at critical times, failing to finish great chances.

You're not going to find a statistical approach that will account for the subtleties that led to this outcome. The problem with soccer stats in general is that everything hinges on low-frequency events based on subtle differences of timing and space.

Basketball by comparison is much more stat-rich, and there are a lot of cool advanced analytics, but even still they are full of gaps that are obvious to any expert watching the game. Afterwards maybe you can find the statistical signature of something you saw, but then you risk overfitting again, just the same as soccer.

> You're not going to find a statistical approach that will account for the subtleties that led to this outcome. The problem with soccer stats in general is that everything hinges on low-frequency events based on subtle differences of timing and space.

I think this deserves to be elaborated a bit: a game in which 1 is a good score, and often a game-winning score, is never going to be accurately predicted based on a statistical approach, because scoring is too rare for a statistical approach to work well. Low scores mean that individual games have an extremely large element of chance.

Imagine one team is about 4% better than another team; they should be favored about 51-49 to score a point. If a game scored 300 points, that difference would be perceptible within one game. But to resolve the same difference accurately in games that score 3 points each takes many, many, many games.