Hacker News new | ask | show | jobs
by bsenftner 1564 days ago
I'm curious what constitutes "big data" anymore. In an intermediate machine learning course, we train on nearly a petabyte of data using Google Colab and Jupyter Notebooks. Nobody discusses the size of the data requiring any special treatment due to its size... would not 95% of a petabyte be "big data"?
2 comments

Big data is a shifting concept as computers gain more storage and faster commodity processors.

My general rule of thumb is whether it is too big to put on my laptop. So greater than a couple of Tb's.

What course are you taking? Imagenet is only 150 GB, and Common Crawl is only 320 TB.

Big data is a moving target, but I’m comfortable defining it as data too large to fit in memory. Obviously, you can always get a bigger node, my rule is thumb is that if you need generators, you are working with big data.