Hacker News new | ask | show | jobs
by qxf2 4856 days ago
>>"The problem of big data has been solved. We know how to gather data and store it."

Nope. Far from it. We are still learning to gather data and store it well. This is a complex problem. The author is underestimating the difficulty in a large number of disparate people collecting data and the variety of formats it produces.

1 comments

Exactly this. At SnowPlow (https://github.com/snowplow/snowplow) we would love to spend more time downstream at the analysis phase (doing ML etc), but we still have to spend a ton of time working upstream on collection, storage, enrichment etc.

A lot of this work is defining, testing and documenting standard protocols, data models etc (see https://github.com/snowplow/snowplow/wiki/SnowPlow-technical... if you're interested). And this is just for eventstream analytics, working with our own data formats - ingesting and mapping third-party formats (e.g. Omniture, MailChimp, MixPanel etc) is another lot of work that needs doing... So a solved problem? Not so much.