Hacker News new | ask | show | jobs
by riknox 1564 days ago
I've not seen Storm being used anywhere sane for a few years at least now, and from a glance at job postings it looks unlikely. Spark, Kafka Streams etc. are definitely used in a modern data platform from my experience.

I think we're seeing a big shift with Hadoop-like workloads being moved onto cloud providers, so BigQuery, Amazon EMR etc.

1 comments

I'm curious what constitutes "big data" anymore. In an intermediate machine learning course, we train on nearly a petabyte of data using Google Colab and Jupyter Notebooks. Nobody discusses the size of the data requiring any special treatment due to its size... would not 95% of a petabyte be "big data"?
Big data is a shifting concept as computers gain more storage and faster commodity processors.

My general rule of thumb is whether it is too big to put on my laptop. So greater than a couple of Tb's.

What course are you taking? Imagenet is only 150 GB, and Common Crawl is only 320 TB.

Big data is a moving target, but I’m comfortable defining it as data too large to fit in memory. Obviously, you can always get a bigger node, my rule is thumb is that if you need generators, you are working with big data.