Hacker News new | ask | show | jobs
by t1m 4629 days ago
It's interesting work, but it's not really 'big data'.

"Every day, we collect around 8 million data points on exercise and video interactions, and a few million more around community discussion, computer science programs, and registrations. Not to mention the raw web request logs, and some client-side events we send to MixPanel."

OK - 8 million records per day. Let's double that for the argument's sake.

Even if they were fairly fat records (1Kb), that's only 16Gb / day. That makes it around 2 months / TB.

I can easily put together a machine with 20TB of storage and run a traditional free relational DB (or even a single free node of Greenplum) and store more than 3 years of this data.

Then bang against it with SQL. Transactions are free.

1 comments

Big data doesn't mean you need 100TB per month. It simply means you have a lot of data and so enormous that you cannot just read through all the data and analyze without more durable methods of computations. And 8 million per day is a lot.

The real question is out of those records they have collected, how much useful data can they extract and what exactly can they extract out that data set beside just who visited from where, etc.