Hacker News new | ask | show | jobs
by arno_v 4256 days ago
This picture shows quite nicely what might happen when having too many parameters (or too little data):

http://machinelearningac.files.wordpress.com/2011/10/polynom...

2 comments

"With four parameters I can fit an elephant, and with five I can make him wiggle his trunk." -- John von Neumann

The green "real" signal in your picture is amusing when juxtaposed with the red "zero-noise-assumption" signal in the last frame. TBH this accounts for most of my distrust of e.g. climate modeling.

In red is your model whereas in green is the real one, M being the number of parameters.

The technical term for the last one is "overfitting" if I remember correctly. But in the case you have an enormous amount of data, it is unlikely to happen.

It reminds me of this awesome course: https://www.coursera.org/course/ml

edit: The parent's parent's parent mention overfit for the MIT work, I don't think it'd be the case if you have that amount of data in hands

Here's another way to think of it.

If the parameter space for my model includes, let's say 10 binary decisions (which is very conservative), that's 1024 possible states of my model. If I tested all 1024 states against historical data, it is likely that some of them might do very well (depending on the general architecture of the model of course). What if I then selected the successful minority and held them up as clever strategies? Their success would very likely have been arbitrary. By basically brute-forcing enough strategies, I will inevitably come across some that were historically successful. But these same historically successful strategies are unlikely to outperform another random strategy in the future. It's not impossible you'll find a nugget of wisdom hidden from everyone else, just much less likely than the more simple explanation I'm offering.

So to your point, it's not just the size of the parameter space versus the data set that matters. Brute-forcing the former alone will likely produce a deceptive minority of winners.

There is a fun chapter on this topic in Jordan Ellenberg's latest book "How not to be wrong". It's called the "Baltimore stockbroker fraud".
It's entirely possible to overfit with enormous amounts of data. As people are now creating models with enormous numbers of parameters.