Hacker News new | ask | show | jobs
by jparker165 4978 days ago
Home-grown systems deserve a place in this discussion as well.

For my point I'll break up analytics into halves: (1) collection and storage of data (2) analysis and presentation

Splunk is an example of something that only does the second half, as collection and storage are done in your own application logs. The downside as noted here is that this requires much customization to teach splunk how to interpret your logs.

On the other hand systems like GA, Mixpanel, Omniture, etc. provide powerful analysis and presentation out of the box, but keep the data locked up in a proprietary format that's usually never available outside their systems.

My personal preference for start-ups is to follow both paths: (A) implement some closed system like GA/Mixpanel that will work immediately (B) simultaneously record all useful data yourself and implement analysis systems as is justified

edit - i guess you can't hack it to look like bullets with spacing

2 comments

At SnowPlow we break home-grown analytics down into five stages:

     Track -> Collect -> ETL -> Store -> Analyse
SnowPlow straddles all five stages - and the data is in non-proprietary formats throughout.

Have a look at https://github.com/snowplow/snowplow if you want to find out more...

Nice, I've gotten close to building this exact data flow from scratch and it was not fun.

You're just missing step 6 ("-> Present"). I'd build some really simple jquery datatables template that will present the output of a hive query, if only to have some screenshots for non-technical people involved in the decision.

Thanks jparker, and you're totally right - we are still missing 6. -> Present :-) We will get round to it - it should be easier once we have connected Infobright as a storage option alongside Hive...
Mixpanel lets you easily export all of your data at any time, either as a raw data dump or filtered by date range, property or segment.

https://mixpanel.com/docs/api-documentation/data-export-api