Hacker News new | ask | show | jobs
by alexatkeplar 4856 days ago
Exactly this. At SnowPlow (https://github.com/snowplow/snowplow) we would love to spend more time downstream at the analysis phase (doing ML etc), but we still have to spend a ton of time working upstream on collection, storage, enrichment etc.

A lot of this work is defining, testing and documenting standard protocols, data models etc (see https://github.com/snowplow/snowplow/wiki/SnowPlow-technical... if you're interested). And this is just for eventstream analytics, working with our own data formats - ingesting and mapping third-party formats (e.g. Omniture, MailChimp, MixPanel etc) is another lot of work that needs doing... So a solved problem? Not so much.