|
|
|
|
|
by t1m
4629 days ago
|
|
It's interesting work, but it's not really 'big data'. "Every day, we collect around 8 million data points on exercise and video interactions, and a few million more around community discussion, computer science programs, and registrations. Not to mention the raw web request logs, and some client-side events we send to MixPanel." OK - 8 million records per day. Let's double that for the argument's sake. Even if they were fairly fat records (1Kb), that's only 16Gb / day. That makes it around 2 months / TB. I can easily put together a machine with 20TB of storage and run a traditional free relational DB (or even a single free node of Greenplum) and store more than 3 years of this data. Then bang against it with SQL. Transactions are free. |
|
The real question is out of those records they have collected, how much useful data can they extract and what exactly can they extract out that data set beside just who visited from where, etc.