|
|
|
|
|
by mendicantB
4349 days ago
|
|
Every model's output quality is dependent on the quantity of data it ingests. Statistics developed as a science because of the need to overcome the weakness of large samples being expensive. Machine learning has taken off as a direct result of the field's ability to take advantage of and get serious performance gains from the massive amounts of data being generated and leveraged recently. Here is the best summation I can reference, and I can tell you from personal experience it is very true: "The accuracy & nature of answers you get on large data sets can be completely different from what you see on small samples. Big data provides a competitive advantage. For the web data sets you describe, it turns out that having 10x the amount of data allows you to automatically discover patterns that would be impossible with smaller samples (think Signal to Noise). The deeper into demographic slices you want to dive, the more data you will need to get the same accuracy." http://www.quora.com/Big-Data/Why-the-current-obsession-with... |
|