Hacker News new | ask | show | jobs
by pbhjpbhj 2890 days ago
Uncertainty is a truism; that's why people want to use a prediction algo. Did the system so better on results it was more certain about?

Predicting the result of an A or B contest the bar is already defined. Either the system gets it right or doesn't, if it gets it right more often than not then (despite this being poor grounds mathematically, on a small result pool) popular press will report it as successful.

IMO if matches become easy to predict then rules will change to reduce that predictability.

1 comments

> Predicting the result of an A or B contest the bar is already defined.

I disagree: If team A has a 10-30% chance of winning, and A pulls off the upset, the correct answer was not "A Wins" it was "B has a 70-90% chance of winning".

For Goldman Sachs' investments, the bar is not to predict that A wins or that B wins, it's to predict the probability and variance regarding which team will win. Of course, from a single upset game, it's impossible to tell whether these estimates are correct. You'd need to see the success or failure of many trials.

The problem is that the 2018 World Cup is not a repeatable event. Neither are most open-market trades (presumably the point of this whole PR stunt being to show that their quants are good at making smart bets in the markets) but they're a LOT closer.

Soccer is a pretty data-poor environment, or at least was historically. Before movement trackers, there was very little data to play with. With movement tracking data slowly building up, I suspect that soccer analytics will soon have their "Moneyball" moment the way baseball did.

The reason baseball got there sooner is that, even without advanced player movement tracking, baseball is a data rich environment. There are ~2500 MLB games played per year in the the 30-team era, and we have at least box scores going back to the late 19th century for most professional games, and pitch-by-pitch data going back to the eighties. In addition, a lot of the most important data is cleaner in nature (pitcher-batter match-ups) and also abundant (compare ~200 pitches in a baseball game to ~15 shots on goal in a soccer game, to take a guess at the order of magnitude).

Computing power can help squeeze more information from the soccer data we collect going forward, but there is a century or more of player tracking data that we can just never ever have, since it wasn't being collected. We know Babe Ruth's batting line but we will never have the soccer equivalent of UZR for Pele. I don't know if there is a retrosheet-equivalent effort for soccer to collect stats from old film, but that would be one way to partially bridge the gap.

> The problem is that the 2018 World Cup is not a repeatable event. Neither are most open-market trades...

The 2018 World Cup is not a repeatable event, Elon Musk buying $10M of Tesla shares is not a repeatable event, and Donald Trump winning the 2016 presidential election is not a repeatable event. Therefore, to meaningfully discuss any of these in the context of probabilities and confidence intervals, we must assume that we generalize them to any soccer game, a stock purchase, or an election, and can do this meaningfully by adjusting our priors. It does make the mathematics a lot less pure.

wasn't leicester city a "moneyball" team? a zero-to-hero club with a roster of modest salaried players who have statistical synergy? i don't follow much premier league but from what i remember hearing about it, they bucked a trend of spending tens/hundreds of millions for megastars to solo carry the team
There are many premier league teams doing a lot more than Leicester when it comes to statistical analysis.

They did indeed win the league with a budget far below many of the normal contenders, but it was a mixture of good management, luck, a few players having the breakout seasons of their careers which took them to the point where only big teams can now afford them, and a few other players having great runs of form that saw them playing better than they would before or after.

Despite the elements of luck, it was an incredible achievement. But the following season they were back to being a team with no realistic chance of competing for the title, and were actually in a relegation fight to stay in the top division.

Literally no data producing phenomenon is a repeatable event, outside of controlled experiments.