Hacker News new | ask | show | jobs
by mikeskim 3711 days ago
This is incorrect in almost every way. When you have 2^m independent observations that you can use to cross validate (where m is very large), overfitting is exceptionally difficult almost regardless of the number of features you have. Overfitting typically occurs when the number of data points is small in magnitude overall and is small compared to the number of features and the observations are not iid.
1 comments

I think he's talking about growth in the features (dependant variables) of your dataset while keeping the number of independent observations constant; not growth in the dataset due to new independent observations.

I think he's correct in discussing it - I find folks propose new features far more frequently than new observations become available.

When people talk about big data, they usually discuss datasets with lots of observations (aka, rows). Not datasets with lots of features but few rows - those are far more common in fields like genetics or omic sciences in general.