|
|
|
|
|
by sixhobbits
3513 days ago
|
|
This reminds me of one of the chapters from "How Not to Be Wrong: The Power of Mathematical Thinking" by Jordan Ellenberg (highly recommended). He describes how "stock brokers" would send out a "free stock prediction" to thousands of email addresses. The prediction would be a simple up/down prediction for a specific stock. The prediction was randomly chosen. But these "brokers" would send an equal number of up and down predictions, ensuring that they got a correct prediction for half of their recipients. They would then throw away half of the emails (the wrong half), and repeat with the remaining half. After ten predictions, there would still be a small number of people remaining for whom they'd sent only correct predictions to (10 in a row, which seems really impressive if you can't see the full picture). They would then contact these few people and offer to keep selling them predictions for a fee. Stories like this (And Paul the Octopus, who I see was mentioned already) are exactly the same thing. Thousands of people are trying to using deep learning (i.e. stats), or other crazy methods as in this article, to make predictions. Of course every now and then one of them is going to work better than expected. This would be the case even if people were simply using random numbers. But we ignore all the ones that fail and give heaps of attention to the Pauls. |
|
For instance, you have a statistical population of one hundred men and one hundred women: you collect as much data as possible about them - as many features as possible, actually - until you find something which happens to be statistically significant for your group (eg. salt consumption). Then, you publish your results, pretending that the feature you found was the original hypothesis for the study ("Our study confirms that salt consumption is higher in males.")