Hacker News new | ask | show | jobs
by sixhobbits 3513 days ago
This reminds me of one of the chapters from "How Not to Be Wrong: The Power of Mathematical Thinking" by Jordan Ellenberg (highly recommended). He describes how "stock brokers" would send out a "free stock prediction" to thousands of email addresses. The prediction would be a simple up/down prediction for a specific stock. The prediction was randomly chosen. But these "brokers" would send an equal number of up and down predictions, ensuring that they got a correct prediction for half of their recipients. They would then throw away half of the emails (the wrong half), and repeat with the remaining half. After ten predictions, there would still be a small number of people remaining for whom they'd sent only correct predictions to (10 in a row, which seems really impressive if you can't see the full picture). They would then contact these few people and offer to keep selling them predictions for a fee.

Stories like this (And Paul the Octopus, who I see was mentioned already) are exactly the same thing. Thousands of people are trying to using deep learning (i.e. stats), or other crazy methods as in this article, to make predictions. Of course every now and then one of them is going to work better than expected. This would be the case even if people were simply using random numbers. But we ignore all the ones that fail and give heaps of attention to the Pauls.

10 comments

If anyone is interested, this is known as p-hacking in statistics (https://en.wikipedia.org/wiki/Data_dredging), and works in a similar way.

For instance, you have a statistical population of one hundred men and one hundred women: you collect as much data as possible about them - as many features as possible, actually - until you find something which happens to be statistically significant for your group (eg. salt consumption). Then, you publish your results, pretending that the feature you found was the original hypothesis for the study ("Our study confirms that salt consumption is higher in males.")

It would be far more specific - you'd collect all their medical details, their ethnicity, age, etc., and then you end up with:

'Salt consumption can increase the risk of liver consumption for middle-aged males of African descent'

... liver consumption ...
I meant liver disease. But I'll leave it this way because it's funnier. And pretty tasty.
"Consumption" is an old-fashioned word for classes of tuberculosis, which can affect the liver. So you could still be right :)
its filled with vitamin a.
I think I see these types of click-bait headlines all the time... and come to think of it, they have very small sample sizes.
Here is a modern version of the same scam [0], using social media accounts and deleting the wrong predictions while the account is set to private.

[0] https://medium.com/message/how-to-always-be-right-on-the-int...

Fantastic comment! In fact, it seems that sports games, or at least NBA games, can be described accurately and consistently using (slightly modified) random walks. Put differently: Outcomes are indeed random and there's not much machine learning you can do here.

Source: https://arxiv.org/abs/1109.2825

And here's a slightly more exciting description of a talk one of the authors gave on that topic at UMass Amherst last year:

https://www.physics.umass.edu/seminars/statistics-of-basketb...

EDIT: I was too stupid to realize that the paper linked above actually supports the parent's opinion, i.e. the idea that successful predictions are statistical artifacts, contrary to what I was thinking earlier.

"The general root of superstition is that men observe when things hit, and not when they miss, and commit to memory the one, and pass over the other." — Sir Francis Bacon
Derren Brown did the same thing with horse racing

https://www.youtube.com/watch?v=lX94fV4TWbc

But this isn't that at all.

1. They made the predictions well before hand and released them to the public.

2. As the article stated, they also did the same thing with Hockey, Derby, and Academy Awards.

If there were an extremely large number of AI's making all those predictions publicly in advance, so many that one might randomly do that well, then the comment would be accurate. But that does not appear to be the case.

There was absolutely SOME luck involved, however, because I don't believe that, for instance, there is zero randomness in the World Series, which would have to be the case if one could absolutely predict it accurately.

[UPDATE: to be clear, I'm assuming that Unanimous didn't make thousands of similarly high-level predictions, and then only report the ones that did well. I think that's a reasonable assumption, because there aren't thousands of high-level predictions on the level of the Oscars and World Series.]

[UPDATE 2: I just registered at the site. It appears that many people can ask the same question, many times. The same question looks like it can be asked, in fact, many thousands of times. If they were simply cherry-picking the one answer out of thousands that was correct, then this is p-hacking. However, the press release is listing questions asked by prominent entities such as Newsweek and TechRepublic. There aren't all that many of such entities asking such questions of UNU. So the water is a little murky, but it still looks like UNU is doing something impressive.]

This technique was also described on The Simpsons: http://simpsons.wikia.com/wiki/Professor_Pigskin
How dare you blaspheme against our prophet, Paul?

(no seriously, great comment)

Except this was a prediction that was done formally for the Boston Globe, at their request. You can see their article about it here:

https://www.bostonglobe.com/sports/redsox/2016/10/04/group-g...

That's pretty different than sending out thousands of random predictions. This was ONE prediction about MLB.

But we don't know how many other predictions were also formally done, by other entities. We're only hearing about this one because it was right.
They predicted the Kentucky Derby (Superfecta) using this same A.I., based on a challenge from another reporter:

http://www.newsweek.com/artificial-intelligence-turns-20-110...

It probably would be more useful if you disclose your connection to the company, and then gave us some technical arguments.

At the moment your comment history doesn't make a great argument, eg: https://news.ycombinator.com/item?id=11663155