Hacker News new | ask | show | jobs
by ploika 1850 days ago
I'm kind of sad that the term "data mining" has fallen out of favour, because large datasets (as with mines) tend to contain a lot of worthless dirt that just has to be sifted through.

10 million rows of data is still pretty big, all the same. You can get away with invoking the Central Limit Theorem after about 30 observations, for instance (with all the usual assumptions and caveats). Sometimes all you're getting for the extra effort is a tighter confidence interval around something that could be pretty well estimated with a couple of hundred rows of data.