Hacker News new | ask | show | jobs
by zhangwins 2295 days ago
Ah I can empathize with you here (as a former DS) -- we had incidents in the past that were data pipeline / instrumentation changes causing bad data which then caused metric drops (versus a real product issue, but they nonetheless caused a loss of confidence in data).

We think there are a number of diagnostic features that could be helpful here (to be built!). Teams today run playbooks to root cause issues when metric drops happen. We should be able to take that playbook and automate it. Say, Orbiter identifies an abnormal change in Metric X. The team is then probably analyzing sub-funnel metrics Y and Z, or looking at various dimension cuts to isolate the issue. Maybe they're also checking data quality by comparing the count of event volume vs. count of user IDs vs. count of device IDs, etc. If we run all of these diagnostic checks when Metric X drops, we could give the team insight into what we know is OK vs. not OK.

1 comments

That's really cool! Besides identifying abrupt changes in metric X, for me the most difficult part is trying to understand what caused this change in X. Great to know that you have this issue in the roadmap, but do you think it's possible to develop a model/automation that is generic enough to be used in different business ? Maybe analysing the correlation between different time series could be a way to go ?
It’s definitely possible if you have the underlying data definitions so you’re not having to compare time-series across industries (it’ll be hard because every single business’ metrics could be so different based on the way the metrics themselves are setup).

Avora (https://avora.com/product/) and Thoughtspot (https://Thoughtspot.com) all have the root cause capability