Hacker News new | ask | show | jobs
by pilotneko 1564 days ago
What course are you taking? Imagenet is only 150 GB, and Common Crawl is only 320 TB.

Big data is a moving target, but I’m comfortable defining it as data too large to fit in memory. Obviously, you can always get a bigger node, my rule is thumb is that if you need generators, you are working with big data.