Hacker News new | ask | show | jobs
An exploratory statistical analysis of the 2014 World Cup Final (beta.deepnote.com)
106 points by epiteton 2199 days ago
17 comments

Ok, so something I wrote years ago hit the HN front page. This is what it feels, uh?

I also have a github with more notebooks about football here: https://github.com/rjtavares/football-crunching

If you gave any questions about football analytics, hit me up.

(Gmail and Twitter are the same name as my HN account, if you prefer email or DM)

Interesting article thanks. A little constructive feedback on some language in the conclusion. "Football is a game of space. That's why parking the bus can actually allow to win a match." The second sentence doesn't make sense unfortunately. Worse, it's not really possible to work out what it means either. It sounds like you mean "That's why parking the bus can be a good strategy" (or something like that). But the first sentence sets up the exact opposite expectation, a sentence like "That's why parking the bus is never a good strategy" would be compatible with the first sentence. Sadly the reader is left not knowing whether you're saying one thing, or the complete opposite.

I understand English is unlikely to be your first language. Please read this as a genuine attempt to be helpful.

There's a grammatical hiccup there with the 'allow' but the meaning is quite clear for soccer-fluent readers.
Completely unclear to me and I've been soccer (football) fluent for over 50 years. I know what parking the bus means (for non football fans - it means falling back and defending in numbers). If football is really a game of space it should be a poor strategy. Is the author saying parking the bus is good or bad strategy?
It's a strategy that gives up possession in exchange for denying offensive space to the opponent and relies on exploiting (through counter attacks) the defensive space the attacker opens. When executed well, it can win games but it's not amenable to the type of analysis presented in this write-up. That's what the bit is about.
Aha, so essentially he meant "football is a game of space, that's why parking the bus is an interesting strategy". Thank you. Too much binary thinking from me. I often struggle to discern the meaning of unclear writing. Which is probably why I spend disproportionate time on my own writing polishing and rewriting for clarity. But I still fail regularly, it's not an easy problem.
This is awesome. Where does the raw data come from? I'm curious how they collect such detailed data (e.g. is there someone tracking every time a specific player takes a certain action?).
There are some companies working in this space, like Opta and Statsbomb. As far as I know, they both use a mix of image processing and humans to collect the data.

It takes a lot of work, but also supports a huge and growing industry. E.g. data scouting is increasingly used by clubs to increase the pool of potential hires.

Did you do similar analysis to 2018 World Cup? Thanks for sharing your analyses of football/soccer, which is the sport that I'm passionate about.
I didn't, but the good news is that data is freely available here: https://github.com/statsbomb/open-data/

Try it!

I created a simple web tool to convert StatsBomb's json data to csv and download for any match: https://nr815jz59d.execute-api.eu-west-2.amazonaws.com/sb/
Germany 2014 was about the best I've seen the game played.

Something I watch that seems to differentiate players and teams at every level is what happens on the first touch when receiving a pass. When you first start watching this, you'll notice that as you move up the ranks, the ball just sticks to a pro's feet, preferably in front of them a couple feet away where they are ready to play it again. Watching Germany that year, their first touch was not merely excellent, but aggressively so. They took the ball and instead of settling it, turned it into a rolling ball in the direction they wanted to play. Or a first touch pass. Speaking in wild generalities here and I don't have numbers to back it up.

Statistically I would guess that their second touch was on average farther away from them than other teams of comparable level, while still being under control.

It reminded me of Tiger Woods when he burst onto the scene. He played the game far more aggressively and relied on his skills to keep him safe rather than traditional shot selection. Germany 2014 decided these slightly riskier touches are consistently possible in the long run and the benefit outweighs the risk.

It also seemed that with the aggressive play, there is just more football played -- more chances. The more chances that are generated, the more it favors the better team.

After 2004, where Germany didn’t get through the group stage in the European championships, they realized they had become complacent, and started a project to focus on footballing technique throughout all their teams.

That turned their national side from a strong team into a strong team where all players have good technical skills.

2014 is where this led to the first great success (https://www.bundesliga.com/en/news/Bundesliga/confederations...), but they only could play like that because of that decade long process.

That year, Germany had a team and it trounced all the other countries who had one or two star players, most notably Brazil. Individually, none of Germany's players became a superstar in Football. It was a lucky combination of technically decent players who just fit together perfectly to frustrate the opposition. Literally.

I would be interesting to see whether this common narrative holds true in the data.

Watch Germany vs. France and say that. The Germans stalled and fouled the whole game and scored off a set piece. In the final versus Argentina too, the game was scoreless until extra time. They also needed extra time versus Algeria. Their only real display of attacking was versus Brazil who self-destructed after having their best player (and top 3 player in the world) horrifically injured.
And yet, perhaps the most breathtaking display of football dominance I saw from that Germany team, was from their 2016 Euro semi-final against France. It was way more impressive than their brazil dismantling. Their control of the game was out of this world.

Football is "unstable" as a game. To dominate in a consistent way, you actually need to be overwhelming superior to your opponent in many different stages/aspects of the game. Any weak link and you are at the mercy of fate.

Ummm they lost... I was actually in France that year. France advanced then lost the final to Portugal.
The one where they lost 2-0?
I think it seems that way because of the 7:1 in the semi finals. But honestly, the result came to be mainly because Brazil was really playing like garbage the whole tournament and only made it that far because the referees had seriously helped them. They shouldn't even have made it out of the group stages.

The remaining German matches were really close (The match agains France, mentioned below, was the real final of the tournament) and even the final was very close all the way to the end. Germany also struggled against Algeria.

Disclaimer: I'm from Argentina. the very next time you hear somebody from my country rooting for Brazil buy a lottery ticket.

Even thought to different results, Brazil is always a force to be reckoned with.

They are the only team (I think) that has constantly reached a certain point at the World Championships.

What I observed in their game, is that they play as a team (Contrary to what my country does) and at the same time, every single one of their players is somebody to be afraid of individually.

Regarding the world championships, if you ask me it wasn't Germany that won the 2014 world cup, nor it was Spain in 2010.

It was Pep Guardiola that won both.

> Brazil was really playing like garbage the whole tournament and only made it that far because the referees had seriously helped them

I disagree, Brazil played great until Neymar got his back injured, he was by far their best player. Also the fact that their best defender and captain Thiago Silva was suspended for the game against Germany didn't help. These two were the best players on that squad. Remember - they beat Chile and Columbia who played really good in that world cup, while Germany barely beat Algeria and France.

huh? Germany barely beat Algeria and struggled a lot against France. And they looked very vulnerable against Argentina in the final had Palacio or Higuain scored their good chances.
If anyone's interested in doing similar analysis, my company StatsBomb makes a lot of event data available on GitHub, including the last World Cup, lots of Champions League finals, NWSL and FAWSL etc:

https://github.com/statsbomb/open-data/

StatsBomb has been great for open football data, which is hard to come by. I always enjoy the Statsbomb panels at Sloan too.

If anyone is interested in getting started with soccer/football analytics, this is a good place to start: https://github.com/devinpleuler/analytics-handbook

Highly recommend this and the "Friends of Tracking" youtube channel.

Also, follow Thom on twitter, he is one of the smartest and most knowledgeable person in this space.

Yeah, the Metrica data used in a lot of the FoT stuff is also an awesome place to start:

https://github.com/metrica-sports/sample-data

Are you able to share how StatsBomb collects event data? I'd love to build something to capture stats for local club teams but probably not if it involves computer vision. Thanks!
If you're interested in collecting small-scale/simple event-data, I put something together for my own use here: https://torvaney.github.io/projects/tracker.html
We do use a combination of humans and computer vision, and some providers (especially of full tracking data) have wholly automated pipelines, but analysts all over the world are collecting stuff manually every day. Really depends what sort of stuff you want to collect - you can use something like SportsCode to match any sort of events to video segments, and if you're a coach or scout you'll know what performance indicators you think are most important. If you just want some sort of team performance metrics you could start collecting shot data (coordinates, defensive pressure, type of assist etc) and build a simple xG model. Nothing you can't do in a spreadsheet or even on paper.
One of those who doesnt like the use of advanced stats in football. I do watch a lot of football though. My reason is simple:

The advanced stats as they are called (xG, xA, xPA, offense actions, defense actions) all end up as a mere tool for disregarding the actual result and providing arguments for which was the better team. Like team A won the match 1-0 but team B were the better team since they had more <<insert favorable stat>>. The idea behind stats should be to contextualize the game I watched a bit more, and not replace the game watching.

Then there is also comparing the stats across games. In football(and every other game) there is a different difficulty level associated with each game (fatigue, condition, team condition, opposition condition, tactics, opposition tactics, teammates, chemistry, opposition mistakes, opposition players, adjustments, pressure, and even sheer luck on occasions), which makes the comparison redundant. An offensive team would always have more shots and more possession than a defensive minded park the bus kind of team. Like you said, right now no stat is good enough to give us an idea of how well a team played. Football is a game of spaces, and a lot of things depend on player movements (or lack of) and vision. I find those reports helpful which tells us where a certain team had the advantage and how they maximised it though.

Do you think the game of football is more complex than, say, the global economy? I doubt it, yet we still use data and statistics to analyse the economy, so why not football or any other sport?
Its not an exact comparison. We use models to analyse the economy but not human behavior. Any sport is a lot about human intelligence, behavior, and skill. Easy to quantify the final output (like numbers that matter such as goals) and compare those. Right now, other than that, we dont have good enough models for any action in football. The context is very important, but that is completely ignored by statistics. Until we get where those things are given due consideration, there is no point quoting or debating over vanity metrics like xG, offensive actions etc. There is a very American tendency to reduct a game to just stats, but that a game is infinitely more complex. You may have just one offensive action in the game and yet still win the game if you are well drilled defensively, and can stop opposition counters.
Your browser is not yet supported

We're sorry about this, but we don't fully support your browser yet. Let us know at help@deepnote.com which browser you're using and we'll make sure to prioritize it. We currently recommend to use the latest version of Chrome, Safari or Firefox and you should be all set. Thanks!

- - -

I’m using Firefox 26 on the latest iOS.

Since Firefox mobile doesn’t use the same versions as desktop, I’m guessing deepnote has just incorrectly implemented the browser support logic. This is a very common issue (Firefox specifically)
Nice catch, will fix.
Nice as an exercise in Pandas and the whole Python data analysis tool chain, the graphics are awesome. As an exercise on football analysis is pretty useless and feeble, and even worst, judging by the author's name (most probably Brazilian or Portuguese) it led his dislike of Argentina to cloud his judgement.

You can perorate all you want about possession and passes completion but if I am a manager and you offer me a first half (who according the author was a Germany slaughter fest) as the Argentinians had it, I will take it every single time. Absent from the analysis are also the clear cut chances of Higuain and Messi (far better from anything than Germany had, despite all the possession) and the controversial disputed ball between Neuer and Higuain. Anyway, it was nice to see the exercise.

Pretty interesting read! I can also recommend the author's medium account[1] full of similar articles (even though in a bit less hands-on format)

[1] https://medium.com/football-crunching

They're similar indeed. I wrote the linked post and that blog is mine.
Pandas code with long lines can look so so ugly. Not a dig towards the original post at all, it's just that the pandas API isn't the most pleasant to work with.
Hi, the author of Deepnote here, this is cool!
Not sure if you can influence this, but I have two platform suggestions that crossed my mind while reading it:

1. I would make the "Run this article as a notebook" more visible. On a first read, I've completely skipped that part as it's very similar to pop-ups on medium. Having an option to directly run/modify this blog would be pretty amazing.

2. The chosen color scheme of code formatting is a bit odd, but that might be just my subjective preference :-)

Of course, thanks for the suggestions.

1. I agree, but there is a reason for this. We'll soon be adding the ability to run/modify feature even without the need to sign up so the whole thing will go away.

2. Thanks! We are experimenting with this (the default scheme is different, but we had a lot of people requesting dark mode so trying out different things for the published articles.)

I really love the dark theme of Sourcegraph in VsCode.

https://marketplace.visualstudio.com/items?itemName=sourcegr...

Nice. Is there a reason that the up-down arrows don't work to scroll the page on Deepnote? PgUp/PgDown work but the arrow keys are blocked for some reason (Chrome, Windows 10)
No good reason, it's a bug, thanks for reporting.

Deepnote is a data science notebook and we are capturing the up/down key events in the main app so that you can move inside the cells and also between the cells. This article is using the same codebase but in read-only mode. I forgot to disable it in this mode.

Very cool. For anyone interested in this kind of tactical analysis, another (though non-computational) one I've enjoyed is http://www.zonalmarking.net/.
In the same vein, but with a more statistical oriented take on european football you can check https://www.alfadata.xyz/blog.

And here is an interview of the founder about his experience building the service and data infrastructure behind it http://datapeek.org/interview/alfadata

Nice visualisations! Note that (if you're a little OCD about this like me) you can append a ; to the last matplotlib command in a block to suppress the text that appears before notebook inline plots.
what's the function that returns from the pandas dataframe the foul that the german goalkeeper made to Higuain?

Jokes aside, pretty good article. I am an Argentinean, so consider that a lot.

How about Higuain's miss from the German header back to Neuer?
I dont know why this is here and not in https://datatau.net/
Out of interest, do you guys know of other such great data sources for other games, for example American Football?

Or are those things mostly proprietary?

It's surprisingly hard to find practical in-depth pandas code examples like this.
How would turn a team into a vector?
thanks for sharing this. Want to learn more about this kind of things.