Hacker News new | ask | show | jobs
by rwissmann 4348 days ago
Until you have reached a very large subset of all available information, more data allows you to make better predictions. Period. That is as true for machine learning as it is of the human brain.

You often want your models to also perform well when you have fewer data points. Those are two separate - if in effect related - design goals.

2 comments

This is true to the extent that you are not overfitting your dataset. Neural networks and random trees are quite good at fitting anything! And still they can perform poorly on your validation set.
Overfitting will only occur when the dataset is too small for the model.
Not only. Your example is too particular. I would say that overfitting tends to occur when one do not understand the underlying dynamic of a system you are trying to model. Any model with enough degrees of liberties can fit anything and still explain nothing.
Possibly when a brain start getting redundant information its predictions start to peak in accuracy.